Advisories Listing

Performance Analysis for Oracle

Advisories Listing

The Database Performance advisories are tasks grouped as follows:

Optimize the Application Workload

Bind Variables

This rule determines if bind variables are being used.

A SQL statement may contain variables that change from execution to execution. These variables are typically parameters to the SQL statement. They define the rows to be processed or new values to be inserted or updated. These variables can be specified either as literals or as bind variables. Using literals to retrieve details for employee 1234 could use the following SQL statement:

SELECT * FROM employees WHERE employee_id=1234

A more efficient and scalable approach is to define the employee_id as a bind variable:

SELECT * FROM employees WHERE employee_id=:empid

The value of the host variable empid will be supplied to the SQL statement when it is executed. When a SQL statement uses bind variables, it is not necessary to re-parse the statement every time the argument changes. When the SQL statement is executed in another session, the matching SQL will be found in the shared pool and parsing will be avoided.

Cursor Management

Determine if cursors are being reused inefficiently.

When executing a SQL statement, the Oracle RDBMS assigns a private work area for that statement. This work area contains information about the SQL statement and the set of data returned or affected by that statement. The cursor is a mechanism by which that work area can be named and the information within it manipulated.

In its simplest form, a cursor can be thought of as a pointer into a table in the database.

Shared Pool Size

Determine if overall database performance would benefit from resizing the shared pool.

The shared pool area of the SGA contains two primary areas: the library cache and the dictionary cache. It also contains control structures such as NLS loadable objects, session and process context areas, enqueue (lock) resources, etc. Additionally, multi-threaded server (MTS) and parallel execution (PX) are optional components that prefer to locate their respective memory areas within the large pool when defined. When a large pool is unavailable, the session memory of MTS and the query buffers of PX will reside in the shared pool.

The SGA is generally categorized into fixed and variable regions. The fixed region is usually quite small, less than 60KB. It primarily contains latch control structures. The variable region is where we find the shared pool. The shared pool’s components grow and shrink based upon workload. Such components include the shared SQL area, the PL/SQL area, the dictionary cache, and control structures for latches, locks, and the current character set. These variable-sizing areas share the shared pool memory such that as one component requires additional memory, another area must relinquish it. Performance degrades when the shared pool is undersized and such components compete for memory.

Reduce Contention

Shared Pool Latch

Determine if significant shared pool latch contention exists.

Shared pool latch contention can occur when the shared pool is sized too large to be effectively managed. This problem is less common starting with Oracle version 8.1.6 when Oracle implemented improved shared pool management architecture. In previous Oracle versions, the shared pool management did little to avoid shared pool latch contention. The overhead to track the memory chunks in a large shared pool impacted the speed at which processes were able to obtain a needed chunk of memory.

Library Cache Latch

Determine if significant library cache latch contention exists.

The library cache latches protect the cached SQL statements and the cached object definitions held in the shared pool library cache. A library cache latch must be obtained before Oracle can add a new statement to the library cache. During a parse request, Oracle searches the library cache for a matching statement. When one is not found, Oracle will acquire the library cache latch, parse the SQL statement, and insert the new SQL. Contention for the library cache latch occurs when an application generates numerous unique, unsharable SQL (when literals have been used instead of bind variables). When library cache latch contention exists, improve the use of bind variables within your application. Misses on this latch can indicate that your application is parsing SQL at a high rate and may be suffering from excessive parse CPU overhead.

Redo Allocation Latch

Determine if significant redo allocation latch contention exists.

Redo allocation latches are used to control the allocation of space for redo entries in the redo log buffer. When redo information is generated, it is written to the log buffer. The redo allocation latch protects against simultaneous writes to the same log buffer memory. There is only one redo allocation latch per instance and only one process can allocate space in the log buffer at a time.

In Oracle versions 7.3 and 8.0, prior to writing a log buffer entry, Oracle compares the redo entry size to the LOG_SMALL_ENTRY_MAX_SIZE parameter. When the amount of redo is smaller than this parameter (defaults to 80 bytes), the redo copy takes place while holding the redo allocation latch. When the redo entry size is larger than this parameter, Oracle first obtains a redo copy latch, then obtains the redo allocation latch, allocates the space, releases the redo allocation latch, performs the copy, and finally releases the redo copy latch. In 8i, Oracle removed the log_small_entry_max_size and always uses redo copy latches by default.

Redo Copy Latch

Determine if significant redo copy latch contention exists

When redo information is generated, it is first written to the log buffer before being written to disk. When the LOG_SIMULTANEOUS_COPIES (renamed _LOG_SIMULTANEOUS_COPIES in 8i) parameter is set to a value greater than 0, Oracle uses redo copy latches to reduce contention for the redo allocation latch. The redo allocation latch is used to prevent another process from writing in the same location of the log buffer. There is only one redo allocation latch per instance and only one process can allocate space in the log buffer at a time.

In Oracle versions 7.3 and 8.0, prior to writing a log buffer entry, Oracle compares the redo entry size to the LOG_SMALL_ENTRY_MAX_SIZE parameter. When the amount of redo is equal to or smaller than this parameter (defaults to 80 bytes), the redo copy takes place while holding the redo allocation latch. When the redo entry size is larger than this parameter, Oracle first obtains a redo copy latch, then obtains the redo allocation latch, allocates the space, releases the redo allocation latch, performs the copy, and finally releases the redo copy latch. In 8i, Oracle removed the log_small_entry_max_size parameter and always uses redo copy latches by default.

Cache Buffers Chains (CBC) Latch

Determine if significant cache buffers chains latch contention exists.

Cache buffers chains (CBC) latches enforce synchronized updates to data in the Oracle buffer cache. Data blocks in the buffer cache are managed in singularly linked lists of data block headers (sometimes referred to as buckets). Data block headers contain information about the data block including the memory address of the block in the buffer cache. To provide efficient data block access, Oracle computes a hash algorithm on the dat block address (DBA - an encoding of file and block number) to identify the cache buffers chain (one of many) that may contain the block. The chain is then scanned to find the corresponding header. When the header is located, the buffer cache address is available and the block can be retrieved. This process is much faster than scanning the entire buffer cache for the requested block.

When a block header is not found in its corresponding cache buffers chain, Oracle reads the data block from the disk, copies it to the buffer cache, and adds its header to the appropriate cache buffers chain. While Oracle scans and updates a cache buffers chain, a CBC latch protects the chain to ensure that block headers are not being added or removed from the list simultaneously by other users. When the scan process begins, the Oracle process "grabs" and holds the CBC latch for the corresponding cache buffers chain and thus prevents other users from changing block headers in all chains protected by the held latch. While the latch is held, other users wanting to scan a cache buffers chain protected by the same latch will wait until it is released. The fewer the number of block headers maintained in a cache buffers chain, the faster the search for a buffer header can be completed, and the less time the latch will be held. Optimally, other users should not have to wait long to access the same cache buffers chains latch.

CacheBuffers LRU Chain Latch

Determine if significant cache buffers LRU chain latch contention exists.

The cache buffers LRU chain latch is held when updates are made to the LRU (Least Recently Used) lists. Two LRU lists are used to track database blocks. The first list, the LRUW, tracks "dirty" (modified blocks not yet written to disk) block buffers. The second list, the LRU, tracks "clean" (unmodified) block buffers. Both lists track the least recently used blocks. When a block is used or freed, it goes to the top of the list, and the blocks below are moved down the list. After the dirty blocks on the LRUW reach a certain level, they are written to disk and placed on the LRU list containing clean blocks. When a clean block is needed, the LRU list is searched. When a clean block is modified, it is placed on the LRUW list for writing by the DB writer.

Cache buffers LRU chain latch contention is caused when the LRU and LRUW lists are updated and the chains of buffers in each hash bucket are too numerous to be scanned quickly. Contention is also caused when the volume of transactions overloads the block tracking on the LRU lists. The solution to either of these cases is to allocate the resources needed to update the LRU lists quickly.

Because latches are used to protect the buffer cache, very high rates of logical I/O - possibley due to untuned SQL - will increase the rate and duration of latch holds and increase the probability that latch contention will occur.

Locking

This rule determines if significant lock contention exists.

Lock (enqueue) waits occur when a process must wait to update data when another process is already updating it. Locks are mechanisms used to provide ordered access when updating data. Oracle also uses locks for maintaining internal data structures and processes may have to wait for access to these internal locks as well (for instance, the Space Transaction enqueue).

When the database is well tuned and the application design sound, lock waits should be negligible.

Tables Experience Lock Waits

Determine if transaction lock waits for one or more tables were observed.

The purpose of a lock within the database is to protect a database resource (in this case a table) that may be needed for a relatively long time compared to latches, which we expect to be held only briefly. The Oracle database locking mechanism allows multiple sessions to wait in line for a given table if that table is currently locked by another session in a mode that is incompatible with the lock being requested.

What is meant by incompatible? For example, a share lock request for a table that already has another share lock on it will likely result in a compatible locking situation, and therefore no locking delays will occur. If, however, a session requested an exclusive lock on a row, another exclusive lock request on that same row will likely have to wait, because the two locks are incompatible.

Non-Indexed Foreign Keys

Foreign key constraints are used to maintain referential integrity between tables related by a common column or columns. For example, an EMPLOYEE table (child) has a foreign key constraint defined on the DEPTNO column. The foreign key ensures that the DEPTNO value in the EMPLOYEE table matches a value in the DEPTNO column of the DEPARTMENT table (parent).

Violations of referential integrity are prevented by making sure that no uncommitted inserts or updates to child tables could be affected by the update or deletion of the foreign key in the parent table (DEPTNO in the DEPARTMENT table). When the child table foreign key is indexed, the index can be checked to determine if any child table rows are affected. When there is no index on the child table foreign key, the transaction updating or deleting the foreign key in the parent table locks the entire child table. The lock on the child table will block any other transactions on the child table attempting to insert, update, or delete until the parent table transaction completes.

In Oracle 9i, the problem of child table locking due to un-indexed foreign keys still exists but has been reduced through downgrading the exclusive lock to a shared lock after the update completes. When the transaction completes, the shared lock is released. This reduces the duration that the exclusive lock is held.

Space Transaction Waits

Determine if significant space transaction lock wait exists. If so, identify users that have permanent tablespaces defined for use as temporary storage.

When Oracle modifies the segments allocated to objects, it grabs and holds the space transaction lock until the modification completes. Excessive segment modification is especially noticeable when permanent tablespaces are assigned for use as temporary storage, significant disk sort operations are occurring in those tablespaces, and the default size for the tablespace is set too low. It may also occur when rollback segments are growing and shrinking excessively.

Oracle addressed this issue with the introduction of tablespaces of type TEMPORARY and further improved the situation with the introduction of Temporary tablespaces in version 8i.

Assigned Temporary Tablespace

Each database user is assigned a tablespace to use for temporary storage operations. Temporary storage is necessary when performing large sorts, large hash joins, and other operations. However, sort operations are typically the primary use for temporary storage. When sort operations are too large to be done entirely in memory, temporary disk storage is utilized to store intermediate sort results. Sort operations are necessary for such purposes as completing ORDER BY or GROUP BY operations, sort-merge join operations, and for index creation.

When creating a tablespace, it can be designated for use as temporary-only storage. By doing this, Oracle more efficiently manages the utilized space for temporary purposes. Temporary tablespaces should always be specified when assigning tablespaces to users for temporary space usage.

Redo Log Disk Configuration

Determine if there are enough redo log disks defined and if they are optimally sized.

When a transaction is committed, a physical write to the redo log file must occur. The write must complete before the COMMIT returns control to the user and therefore, the writing of the redo log imposes a limit to throughput of update intensive applications. Redo log I/O is optimized when the log is located on a dedicated device where there is no contention with other processes. By not having other processes using the disk device, the disk head maintains its position while awaiting the next write request. Thus, the disk does not need to take the time to "seek" before performing the next write and thus write times are minimized.

Number of Redo Logs

Determine if the database has a sufficient number of redo logs.

A log switch checkpoint must complete before a redo log group can be used. This checkpoint ensures that all transactional information in the redo log buffer has been written to disk.

When in ARCHIVELOG mode, redo log groups must be archived before they can be reused. When the archiving process is sufficiently slow (usually due to disk device contention) and the archiver has not yet completed archiving a redo log group, the database will halt and transactions will wait for the archive process to complete before allowing the LGWR (log writer) process to re-use a redo log group.

The database cannot be modified before the checkpoint or the archive processes have completed. Avoiding these waits is therefore very important to ensure adequate throughput and response time.

The best way to guarantee that you do not experience waits for checkpoints or for archiving is to ensure that there are enough redo log groups. This allows checkpoints and archiving to complete before the logs are scheduled for reuse.

Redo Log Size

Determine if redo logs are sized too small.

When a redo log is full, Oracle must perform a "log switch." Redo logs should be large enough to avoid too frequent log
switches and any corresponding log switch waits.

Log switch waits can occur under the following conditions:

The log switches are too frequent.<BR>&nbsp
The log switch checkpoint is not complete.<BR>&nbsp
The database is in ARCHIVELOG mode and the log being switched to is not yet archived.

Redo logs should be sufficiently large (and numerous) to avoid these conditions.

Redo Log Buffer Size

Determine if resizing the redo buffer log will improve performance.

The redo log buffer stores changes that will eventually be written to the redo log files. It buffers these writes so that the user session can continue working without requiring a disk I/O. When a COMMIT is issued, changes in the redo log are written to a disk by the log writer before the COMMIT is completed. When the log buffer reaches 1/3rd full, the log is written to disk even without a COMMIT.

Sequence Cache Size

Determine if sequence generators have an optimal cache size.

Oracle sequence number generators utilize a built-in caching mechanism to gain their substantial efficiency over an approach of storing and retrieving sequence values from a user table. This is because reading and updating a sequence value in memory is much faster than updating one stored on disk. Oracle sequence generators are cached in the shared memory area (the SGA). By default, 20 numbers are cached. Once the 20 cached numbers are used by the application, Oracle caches another 20 values and updates Oracle’s internal sequence table. When sequence numbers are allocated at a high rate, performance can be improved by increasing sequence cache size. A larger cache size reduces the number of required read/writes to the sequence table.

Increasing the cache size does not consume additional memory in the SGA. However, it does increase the number of sequence numbers that can be “lost” if the cache is flushed or the database shutdown. Most applications do not require a complete set of contiguous numbers and therefore, “lost” sequence numbers do not pose a problem. If a specific application does require complete accounting for a set of sequence numbers, take care in designing the application code to avoid “loss” of numbers, including when transactions are rolled back. A complete discussion of such design is beyond the scope of this topic.

Buffer Busy Wait

Determine if database buffers are experiencing contention.

Buffer busy waits happen when multiple sessions want concurrent access to a block in the buffer cache. The modifying session marks the block’s header with a flag letting other users know a change is taking place and to wait. This temporary blocking ensures the reader has a coherent image of the block (with either the before or after image). The two main cases where this wait can occur are:

Another session is reading the block into the buffer
Another session holds the buffer in an incompatible mode

While the block is being changed, the block is marked unreadable. The changes should last under a few hundredths of a second. A disk read should be under 10 to 20 milliseconds and a block modification should be under one millisecond. Many buffer busy waits need to occur before performance degradation is noticeable.

However, in a problem situation, there is usually a hot block, such as the first block on the free list of a table with high concurrent inserts. Users insert into that block at the same time, until the block exceeds PCTFREE, then users insert into the next free block on the list, and so on.

Description of Freelists

Freelists are structures which are included within each table. They list the blocks that are free to be used for inserts. These structures are maintained in the table header block (unless multiple FREELIST_GROUPS are defined). Multiple freelists can be configured with the FREELISTS clause in the CREATE TABLE and (in 8.1.6) the ALTER TABLE statements. Configuring multiple freelists is an important optimization for tables that are going to be subjected to high rates of concurrent inserts.

A buffer busy wait occurs when an oracle session wants to modify or pin a buffer (copy of a database block in the SGA) but cannot because another session holds the buffer in an incompatible mode. The V$WAITSTAT virtual table records the number and duration of "buffer busy" waits, broken down by the class of buffer for which waits are occurring.

When there are insufficient freelists, multiple sessions are allocated the same block in which to perform their insert When they both insert into this data block concurrently, one session performs the insert and the other experiences a "buffer busy wait" on "data block".

Rollback Segments

Determine if sufficient rollback segments are defined.

When Oracle transactions make database changes, the information required to undo those changes is written to a rollback segment. Oracle ensures that the available rollback segments are evenly used. However, when too many simultaneously active database transactions are using the same rollback segment, the generation of rollback information can be inefficient.

Cluster Interconnect Latency

Determine if the current workload across the interconnect in a Real Application Cluster (RAC) environment is an indicator of any network latency.

RAC uses cluster interconnects to transfer blocks between the nodes participating in the cluster. A block transfer occurs when a user session connected to one instance requests a block being held in the cache of another instance. This feature of RAC to transfer information from the cache of one instance to the cache of the requesting instance is called cache fusion.

Oracle introduced Cache Fusion with Oracle 9i. Prior to Oracle 9i under Oracle Parallel Server, data was shared between users by forcing the instance holding the data to first write it to disk so that the requesting instance could then read it. With Cache Fusion, when users on one instance request data held in cache on another instance, the holding instance transfers the data across a cluster interconnect and avoids any writes and reads to disk. Disk I/O is significantly slower than cache transfers via the cluster interconnects.

The performance of the cluster interconnects is crucial to the performance of the RAC cluster and more specifically to the movement of cached data between instances. Its performance is measured by determining the average time a block takes to reach the requested node i.e., from the moment that a block was requested by an instance to the moment the requesting instance received the block. As in any application or database instance, occasional spikes of user activity are expected. However, when the average of such spikes remains high for an extended period of time, it could indicate a correctable performance degradation of the database cluster.

While high average GCS CR block receive time indicates possible interconnect performance issues, tuning and monitoring the following areas may help improve the performance of the cluster interconnects.

Ensure that dedicated private interconnects are configured between instances participating in the cluster for cache fusion activity.
The average GCS CR block receive time could also be influenced by a high value of the DB_FILE_MULTIBLOCK_READ_COUNT parameter. This is because a requesting process can issue more than one request for a block depending on the setting of this parameter and in turn the requesting process may have to wait longer. Thus, sizing of this parameter in a RAC environment should be based on the interconnect latency and the packet sizes defined by the hardware vendor and after taking into consideration operating system limitations. On certain platforms this could also be related to bug # 2475236. Bug#2475236 caused cr request timeouts due to the udp parameters being hardcoded in the Oracle code.
Process scheduling priorities and process queues lengths at the operating system level would help identify if additional processors would be required based on the run queue backlogs.
The LMS background process (represents global cache services or GCS) of the requested instance may not be able to keep up with the requests from other instances and may cause the LMT background process (represents global enqueue services or GES) to queue new requests increasing its backlog and thus causing delayed response. The LMS process accepts requests on a First-In-First-Out (FIFO) basis. Oracle by default creates one LMS process for every two CPU’s on the system. While Oracle has not provided any direct method to tune the number of LMS processes, one method available is to increase them by modifying a hidden parameter _LM_DLMD_PROCS (see note below regarding modifying hidden parameters).
Multiple cluster interconnects can be configured and allocated to Oracle using the parameter CLUSTER_INTERCONNECTS. The parameter overrides the default interconnect setting at the operating system level with a preferred traffic network and in certain platforms disables the high availability feature of RAC.
Evaluate and reduce the number of full table scans on the database. Queries that retrieve data by performing full table scans could cause large cache movement between instances and thus cause significant load on the cluster interconnects.

NOTE: On certain platforms such as SUN, enabling the CLUSTER_INTERCONNECT parameter could disable the high availability feature of RAC.

DISCLAIMER: Hidden parameters (those that begin with an underscore) such as _LM_DLMD_PRCS should be modified with caution and only after first consulting with Oracle Support. It should be noted that Oracle could change or remove such parameters without notice in subsequent releases.

While single instance performance remains important, in a RAC or Oracle clustered environment, applications and users share multiple instances connected to the same physical copy of the database and the cluster interconnects plays a very critical role in the performance of these systems.

Minimize Physical Disk I/O

Buffer Pools

Determine if resizing the buffer cache will improve performance.

Reading a data block from a physical disk device is thousands of times slower than reading it from memory. A disk read is avoided when a data block is already in the buffer cache. Avoiding disk reads in this manner is one of the most effective optimizations that can be made to an Oracle database. Allocating more memory to the buffer cache improves the chance that a data block will be in memory when requested a second time.

An often used measure of evaluating whether the buffer cache is sized too small is the buffer cache "hit ratio." It expresses how often a requested block is found in the buffer cache without requiring a disk read. The buffer cache "hit ratio" is calculated by dividing the number of reads that did not require a disk read by the total number of reads requested. Often, when the hit ratio is low, or the absolute number of disk reads is too high, increasing the size of the buffer cache can significantly improve disk performance.

Interpreting the Buffer Cache Hit Ratio
While a low buffer cache hit ratio can indicate a need to increase the cache size, there are other factors to consider:

Repeated scanning of large tables lowers the buffer cache hit ratio. Increasing the cache size here will have little effect on the hit ratio because these blocks are unlikely to be reused.
A high cache hit ratio can conceal inefficient SQL - when the application frequently scans an unselective index it is more likely to find these blocks in the cache and the hit ratio will be inflated.
When adding multiple buffer pools to Oracle, consider the reasons for allocating objects to each pool. The KEEP pool is for small tables that are frequently scanned and should have a high hit ratio. The RECYCLE pool is allocated for large infrequently scanned tables and prevents these tables from occupying unnecessary space in the cache. The DEFAULT pool is for the tables that are subject to index-lookup.

Oracle introduced a buffer cache advisory in 9i. The Oracle buffer cache advisory option is used to evaluate the need to adjust buffer cache sizes. In order to utilize this option, it must be enabled. When enabled, the Oracle statistics that are used to estimate the effects of resizing the buffer cache are displayed.

The estimated physical reads per buffer cache size are populated in the V$DB_CACHE_ADVICE view. This view contains 20 rows for each pool present in the buffer cache. Each row represents a possible cache size and includes the estimated physical reads for that size. These estimates are based on the database workload from the time the advice is turned on.

Sort Area Size

Determine if adjusting the sort area size may improve performance

Increasing the sort area size allocates more memory for sort operations and thus can improve overall performance by reducing the need to use slower disk storage to perform sorts. The sort area size is specified by setting the SORT_AREA_SIZE configuration parameter.

Sort operations in Oracle are of two types and commonly referred to as either disk sorts or memory sorts. SORT_AREA_SIZE is an Oracle configuration parameter that specifies the memory allocated for sort operations and effectively determines which type of sort is performed. When the data volume of the sort operation is small enough to fit in the allocated sort memory, the sort can be performed using only memory and no disk storage. Otherwise, temporary disk storage must be used to perform the sort.

SORT_AREA_SIZE defines the maximum size, in bytes, of the memory that can be allocated to each user session for sorting. The allocation of sort memory only occurs when the session first performs a sort. The size of memory allocated is the amount of memory sufficient to perform the sort up to the SORT_AREA_SIZE maximum. SQL statements that use the ORDER BY or GROUP BY clauses and index creation are examples of activities that require sort operations.

The value of SORT_AREA_SIZE is limited by the physical memory of the computer running the Oracle database. When the machine on which the database resides has an abundance of memory, the value of this configuration parameter can be increased beyond the default.

Note: When using the cost-based optimizer, it is possible that increasing the sort area size may result in decreased performance contrary to the usual affect of a performance increase. The cost-based optimizer considers the size of the sort area when deciding which join operation to use for joining two tables. Increasing the sort area size may cause the optimizer to select a sort-merge operation over a nested loops operation when it believes the presence of more sort area space will result in better performance. On some few occasions, this optimizer choice may cause reduced performance. This adverse affect can typically be corrected by adding hints to the one or two SQL statements that exhibit this affect. Although this performance effect is unlikely to occur in most databases, always test performance modifications in a test environment to ensure they have the desired effect in the environment.

Oracle 9i introduced new automatic memory management for work area memory allocation. Automatic memory management can be set on or off. When set on, you no longer have to set values for the various *_area_size parameters (sort_area_size, hash_area_size, bitmap_merge_area_size and create_bitmap_area_size). Oracle allocates appropriate work areas and keeps total memory allocation within the amount specified by the pga_aggregate_target parameter. To enable automatic work area memory allocation, set the pga_aggregate_target parameter greater than 0 and set the workarea_size_policy parameter to “AUTO”.

PGA Size

Determine if resizing the PGA will improve performance.

Automatic SQL Execution Management Concepts

Oracle allocates a private process memory area for each server process called the Program Global Area or PGA. This memory will be allocated whether the type of connection is dedicated or shared. The contents and location of the memory will vary depending upon the type of connection. The table below lists the two main components and their corresponding memory locations for dedicated and shared servers.

PGA Contents	Description	Memory location for dedicated servers	Memory location for shared servers
Session memory	Holds session’s logon information and other details regarding session.	PGA	SGA
SQL Execution memory	Data such as bind information and runtime memory structures	PGA	Part in SGA and PGA
Persistent area	Holds the cursor information that persists across multiple executions of the same statement such as, bind details, data type conversion, etc. The persistent area is deallocated when the cursor is closed.	PGA	SGA
Runtime area	This area contains information used while a SQL statement is being executed. Its size depends on the type and complexity of SQL statement as well as the number and size of rows being processed. A large portion of this memory is dedicated as the working area for operations such as sort, hash-join, bitmap-merge etc.	PGA	PGA for DML/DDL SGA for queries

From this table you can see that SQL execution memory is the most important component in the PGA. The size will vary depending upon the complexity and data size of the SQL being executed. Obviously the DBA would be more inclined towards tuning this component to reduce response time because of the complex SQL operations like sorting, hash-joins etc. Before Oracle9i a DBA had to depend upon initialization parameters such as SORT_AREA_SIZE, HASH_AREA_SIZE etc. to adjust fixed sizes. Of course these parameters are not adjusted based on the load of the system.

The new SQL memory management introduced in 9i will solve these issues. A DBA can initially start the instance with a predefined PGA target size using the initialization parameter PGA_AGGREGATE_TARGET. Oracle will allocate appropriate run time memory for the SQL operators keeping this target size as its high water mark. By this Oracle avoids the abnormal growth of the total PGA size. In addition, Oracle adapts the SQL operation’s size according to the current load and complexity of the instance. Unfortunately this new SQL memory management is available only for dedicated servers, so DBAs should still depend on parameters like SORT_AREA_SIZE, HASH_AREA_SIZE etc., for shared servers.

How does SQL memory management affect the performance? Generally speaking, a larger working area can significantly improve the performance of SQL operations and therefore reduce the response time of queries. Ideally, the size of the work area should be big enough so that it can accommodate all input data and auxiliary structures needed by the operation. Oracle allocates three types of memory sizes for SQL operations.

Cache size

It is the amount of memory required to run the entire SQL operation in memory.
Cache size optimizes performance.

One-pass size

It is the amount of memory required to prevent more than one pass through the input data.
Response time increases if only part of input data can be accommodated.

Multi-pass size

It is the amount of memory allocated for a small part of the input data necessitating multiple passes over input data.
Results in a drastic increase in response time.

Using PGA_AGGREGATE_TARGET as the high water mark for the total PGA size, Oracle allocates cache, one-pass, or multi-pass sizes for the PGA. If load increases, Oracle will initially try to allocate the one-pass size rather than the cache size. For these size calculations Oracle uses a “Feedback loop mechanism” to maintain total memory allocations well below the PGA target size.

Optimize Physical Disk I/O

Datafile I/O Load Balance

Determine if I/O load is balanced across datafile disk devices.

Datafiles should have similar I/O service times. When datafiles have very different I/O service times, it indicates that the I/O load across datafile disk devices may not be balanced. Following are common reasons why a device/filesystem may have longer service times than another:

It may be physically slower (IO per second rating or average service time rating).
It may be servicing simultaneous requests from multiple processes and/or datafiles (contention).
It may not be using caching (or better caching algorithms).
Other devices may be using RAID or other advanced technology disk architectures.

What is RAID?

RAID (Redundant Array of Independent Disks) arrays are an increasingly popular way of delivering fault tolerant, high performance disk configurations. There are a number of factors to take into consideration when deciding upon a RAID configuration and RAID level to implement. There are three common levels of RAID provided by storage vendors:

RAID 0
Sometimes referred to as "striping" disks. In this configuration, a logical disk is constructed from multiple physical disks. The data contained on the logical disk is spread evenly across the physical disk and hence random I/Os are also likely to be spread evenly. There is no redundancy built into this configuration so if a disk fails it will have to be recovered from a backup.

RAID 1
Referred to as disk "mirroring". In this configuration a logical disk is comprised of two physical disks. In the event that one physical disk fails, processing can continue using the other physical disk. Each disk contains identical data, and writes are processed in parallel so there should be no negative effects on write performance. Two disks are available for reads so there can be an improvement in read performance.

RAID 5
In this configuration, a logical disk is comprised of multiple physical disks. Data is arranged across the physical devices in a similar way to disk striping (RAID 0). However a certain portion of the data on the physical devices is parity data. This parity data contains enough information to derive data on other disks should a single physical device fail.

It's common to combine RAID 0 and RAID 1. Such striped and mirrored configurations offer protection against hardware failure together with spread of I/O load.

Performance implications of RAID

Both RAID 0 and RAID 5 improve the performance of concurrent random reads by spreading the load across multiple devices. However, RAID 5 tends to degrade write I/O performance since both the source block and the parity block must be read and then updated.

Database Writer Processes

Determine if a sufficient number of database writer processes or slaves are configured.

The database writer process (DBWR) is the only process that writes modified database blocks from the buffer cach to the database files. The database writer works asynchronously. This means that a user process never needs to wait for the database writer to finish writing. However, if the DBWR falls behind sufficiently, then the buffer cache will fill up with "dirty" blocks and waits will occur while user processes try to read new blocks into the cache. These bottlenecks can be eliminated by configuring multiple database writers/slaves or by enabling asynchronous I/O.

Keeping the database writer optimized is critical to maintaining database throughput. The best way to optimize database writer throughput is to spread I/O across multiple disk devices and allow the database writer to write to these disk devices in parallel.

Parallel database file writes can be achieved using the following methods:

Multiple database writers can be configured using the DB_WRITERS configuration parameter in Oracle7, or the DB_WRITER_PROCESSES or DBWR_IO_SLAVES parameters in Oracle8+.
Operating system asynchronous I/O or list I/O can be enabled. This will allow the database writer to issue write requests against multiple disk devices simultaneously.

Operating system asynchronous I/O performs more efficiently than multiple database writers. However, asynchronous I/O may not be available on all platforms or may require special measures. In some operating systems, asynchronous I/O may require building the database on raw devices. In large systems undergoing heavy write activity, multiple database writers can improve performance even when asynchronous I/O is in use.

Number of Datafile Disk Devices

Determine if sufficient disk devices are allocated for database files.

The amount of physical (non-cached) I/O that can be performed by the database server is constrained by the following factors:

The number and type of disk devices allocated to the database
The I/O rate each disk can sustain
The degree to which database I/O is evenly spread across disks
The degree to which database I/O is evenly spread over time
Contention for database disk devices by non-Oracle processes

For example, suppose there are 3 disk devices, each capable of maintaining a sustained I/O rate of 100 I/Os per second. When the I/Os are spread evenly across these devices, they are able to sustain a physical I/O rate of 300 I/Os per second. <link IOOPT">Click Here</link> for ideas that may reduce I/O load.

In most cases, reducing the number of physical I/O requests (by tuning the application or configuring a larger buffer cache) is preferable to adding disk devices. Adding devices can only spread existing I/O load, whereas tuning SQL or application code can eliminate unnecessary or reduce excessive I/O. Memory access is significantly faster than physical I/O; note that memory access times are measured in nanoseconds while disk access times are measured in milliseconds. <link BCOV>Click Here</link> for ideas on reducing I/O load by tuning the buffer cache.

Datafile devices are assumed to be dedicated to this Oracle instance. Oracle collects statistics for the I/O requests that it makes. Operating system utilities such as iostat and sar can measure all activity against disk devices.

Devices dedicated to Oracle include drive controllers as well as disk drives. Ensure the operating system and other applications are not causing device contention. A thorough understanding of the network topology and contention points may assist tuning.

Recent Storage Technology Advancements

Since it is not possible to determine the physical characteristics of the disk devices during analysis, it is important to understand the database storage topology. Without such understanding, it is difficult to plan or properly tune device utilization. There have been several recent advancements in storage technology that make higher I/O rates possible. A basic understanding of disk device concepts is required before moving existing datafiles or adding more devices to spread existing or anticipated load.

A common question when planning for storage is whether more smaller disks is better than fewer larger disks. The correct answer is: it depends. Cost and throughput are usually deciding factors. As storage capacity has increased significantly, the average access time has not. Calculations should be made regarding cost vs. performance of the usable portions of the disk. See the ZBR discussion later in this topic. Also consider disk controllers, RAID, and other redundancy technologies into the overall plan. Sixteen 9 GB disks in the environment may or may not have better cost/throughput than two 72 GB disks.

Disk drives are commonly being grouped together in what are called storage systems or network appliances. These technologies include RAID arrays, SAN, NAS, and hybrids thereof. Specific enhancements have been made to each of these technologies to support database systems. Many SAN and NAS products available today incorporate RAID technology, yet there can be RAID without SAN or NAS.

In general, SAN is for block-based and transaction-based applications. NAS is more geared toward file-based systems. Talk with the system administrator or hardware vendor to decide whether NAS or SAN is right for the business needs. Enhancements have been designed into dedicated SAN/NAS hardware with database features in mind such as parallel server, backup/recovery, replication, and VLDB. Another benefit of these technologies is that they offload mirroring (for example the redo logs) to the hardware – further increasing performance.

Caching data at the operating system (OS) or storage device-level for the purposes of decreasing read I/O service time may not help much in an Oracle environment. This is because Oracle already caches the most commonly requested data (in the SGA), leaving a high probability of a physical I/O for all other data. It is possible that multiple caching layers degrades overall read access times.

On the other hand, when a storage device caches write requests, it improves the performance of the database writer (DBWR) process(es) by allowing write requests to complete sooner. Some hardware configurations result in very low write times. For example, a single DBWR with an advanced disk write-cache can easily show write times of less than 1 millisecond per write request.

A logical volume manager (LVM) is disk management software that groups arbitrary physical disks into virtual disk volumes. Depending upon configuration, it can reduce overall access times by spreading load across multiple physical devices. Attempting to spread datafiles across LVM-managed disks without understanding how the disks are configured could prove counter-productive in I/O tuning efforts. Work with the system administrator to understand their configuration.

For random I/O, disk seek/latency time is reduced and I/O throughput is increased for data stored on the outer-portion of a disk formatted using zone bit recording (ZBR) technology. Simply put, the outer-edge of a disk contains more "data" than the inner-edge. Most modern disks can be logically partitioned such that Oracle data files can be located in a partition that only spans the outer tracks of a disk. It is quite possible to receive better throughput from partitioning and only using the outer-most 25% of a 72 GB drive at 7200 RPM, rather than an 18 GB drive at 10,000 RPM.

Finally, one additional technology that can add big value when implemented appropriately is solid state disks (SSDs). The primary advantages of SSDs are their ultra low latency servicing requests and their virtually nil contention, due to the lack of physical head movement. However, the per-GB cost of SSDs is relatively high compared to standard disks. Solid state disks are capable of 10,000 to 1,000,000+ I/Os per second. Some good uses for SSDs include:

Redo logs – by providing low latency writes and high-bandwidth reads (for archiving). Particularly useful in OPS/RAC environments where multiple servers can share the same SSD device. This makes cost justification easy – 1 SDD rather than 2 standard disks (excluding RAID protection) for each node in the RAC.
Temporary tablespaces – by providing low latency and high concurrency. Primarily used when sorting memory exceeds SORT_AREA_SIZE or for scratch space when using global temporary tables.
Undo/Rollback tablespaces – by providing low latency.

Review Performance Deviations

CPU Usage Deviation

CPU Wait Deviation

I/O Wait Deviation

Latch Wait Deviation

Lock Wait Deviation

Other Deviation

Workload Deviation

previous page start