Saturday, March 08, 2008

SQL Tuning advanced

Application Tuning using SQL

This document is meant to explain the points to be considered, while formulating your SQL’s.

Before tuning your SQL you should understand the oracle memory structure and different phases of SQL query processing.

Memory Structure
System Global Area


DB buffer Log buffer Sharedpool


Library cache Data dictionary cache


Private sql area Shared sql area


Persistent area Run time area

Whatever the operation we are performing, everything will be handled only through the Random Access Memory. Oracle also does its process through memory by allocating a separate area for its access called SGA. Of course, apart from this SGA, oracle allocates Sort area, Software Code area, & Program Global area(PGA).

One of the important factors of the database operation is the DB buffer cache, and this buffer cache is made up of memory blocks of the same size of Oracle’s blocks. Since, all the data stored in terms of blocks in the physical data files can be manipulated only in the
DB buffer cache. The buffers stored in the database buffer cache can be classified as :
Dirty buffers: are those buffers, which hold data, touched by transaction.
Pinned buffers: are those buffers, which are being modified by transaction.
- Free buffers: are those buffers, which are used to hold data from datafile.

DB buffer cache contains two lists called
LRU (Least Recently Used list)
MRU (Most Recently Used list)
Whenever an oracle user process accesses a piece of data, it has to copy data from disk to the buffer, before accessing it. This is called cache miss, whom we should avoid as much as possible to increase the query performance.

A cache hit is nothing but when a process access a piece of data that is already in the cache, the process can read the data directly from memory. Normally a cache hit is faster than, cache miss because the data is read from the memory instead of disk.

Log Buffers: For a single atomic change made to the database by insert, update, delete, create, alter or drop operations, server process creates redo entries to reconstruct, or redo the changes made. And these redo entries for each and every transactions are stored in the log buffer cache, which is circular buffer in the SGA.

Shared Pool: The shared pool contains two parts
Data dictionary cache. Oracle accesses the data dictionary frequently during the process of SQL statements. Since the data dictionary is accessed so often by Oracle, it allocates one special area called data dictionary cache. The data dictionary caches are shared by all Oracle user process.

Library cache: It contained two main areas called.

Shared SQL area, which contains the parsed representation (Parse tree) of the SQL statements fired by the user and the cursors in the Shared SQL area are maintained by LRU algorithm. Whenever the user fires a SQL statement, Oracle checks the shared pool to see if a shared SQL area already exists for an identical statement. If there is already a shared SQL area for the statement, it’s used for the execution of the subsequent new instances of the statement. In case, if there is not a shared SQL area for a statement, Oracle allocates a new cursor in the shared SQL area after it’s getting parsed in the user’s Pvt. SQL area.

Private SQL area, is a memory area that contains data such as bind information and runtime buffers. Each session that issues a SQL statement has a private SQL area. It can be classified as persistent area & Run time area. The persistent area contains bind information that persists across executions, code for data type conversion, and other state information. The Runtime area contains information used while the SQL statements is being executed.

5 phases of processing the SQL query:
1) Open
2) Parse
3) Fetch
4) Execute
5) Close

In Open phase, one cursor area (context area) will be opened for your query

In Parse phase, the following processes will take place
i) SQL statement is verified, whether valid or not (Syntax checking).
ii) The tables, columns and privileges are verified (Symantec checking)
iii) A Parse tree is prepared, based on the execution plan
iv) The parse tree into SQL area (reuse to identical query).

In Fetch phase, rows are selected from the corresponding data files and put into the data block buffer, by server process.

In Execute phase, user process sends the records to the corresponding users.

In Close phase the opened cursor will be closed and temporary memory will be released.


Approach to tune SQL:

I. Restructure the Index.
Remove non-selective indexes to speed up insert/update/delete.
Index performance-critical access path
Consider hash cluster for uniqueness data
Consider Index cluster.
Consider composite index for better selectivity and additional data storage
Consider Bit-map index for low cardinal fields like SEX.

Points to be noted:
Even though you are having indexes, your query will never use that index when you
write a query with following conditions.
col1 and col2 are in the same table.
col1 > col2
col1 < col2
col1 >= col2
col1 <= col2
Where Col1 is not null
Where Col1 is null
Where Col1 not in
Where col1!=
Where col1 like ‘% pattern’

Use the following guide lines to decide which columns to be indexed:
Choose columns that are most frequently specified in WHERE clauses.
Frequently accessed columns in SELECT clause can most benefit from indexes.
Don’t index columns that do not have many unique values (poor selectivity). Columns in which a good percentage of rows are duplicates can not take advantage of indexes. In this case, consider bitmap indexes on poor selectivity fields or consider composite index to improve the selectivity.
Columns that have unique values are excellent candidates for indexing. Oracle automatically indexes on unique/primary key fields.
Columns that are commonly used in Joins are good candidates for index.
Frequently modified columns probably should not be indexed fields because of the overhead involved in updating/inserting/deleting.
If you are using oracle 8i, you can very well consider function-based indexes.
A function-based index is an index on an expression. Oracle strongly recommends using function-based indexes whenever possible.
Example: create index idx on emp (upper (empname));
Select * from emp where upper (empname)=’MARK’; - Here index
will perform for range scan.
Oracle strongly recommends Bitmap Index on less selectivity fields like SEX, DEPTNO and etc.

Select count (*) from emp; is slow because of data dictionary hit.
Select count (rowid) from emp; is fast.

While creating indexes on a table if you have created a composite index on columns (A,B) and a separate index exists on the column A then the second index is not really required as the first index will suffice for queries using on the column A in the search condition.

As a thumb rule never index more than 50% of the columns of a given table as in this cases using the index might be slower than a full table scan.

Indexing does improve performance for querying but is an overhead when inserting
and updating. The performance degrade due to excessive indexes is more apparent in
OLTP databases and also during batch loads.

Restructure the SQL Statements.

a. When you write conditions those compare columns with constants, wherever
Possible, don’t use expressions.
Eg:
Sal>(24000/12) – Not optimized
Sal >2000 – Optimized

b. While using LIKE operator, if you are not using wildcards, then don’t use LIKE.
Use equality operator instead.
Eg.
desig Like ‘DBA’ – Not optimized
desig=’DBA’ – Optimized
desig like ‘DB%’ – Correct. But if you know the full string, better to use it.

c. When the number of condition in IN operator is less, use OR and = operator
Eg.
WHERE dno IN (10,20,30) – Not optimized
dno=10 OR dno=20 OR dno=30 – Optimized.

d . When you use ANY or ALL or BETWEEN operators, the query optimizer expands the condition .So, it’s better that you expand the condition as far as possible using OR operator .
Eg.1
Where Sal=ANY(5000,6000) – Not optimized
Where sal=5000 OR sal=6000 – Optimized
Eg2.
Where sal >ALL(5000,6000) – Not optimized
Where sal >5000 AND sal>6000 – Optimized
Eg3.
Where sal BETWEEN 2000 AND 5000 – Not optimized
Where sal >=2000 AND sal<=5000 – Optimized

e. Try to avoid using the NOT logical operator as far as possible. Use <> or != or ^= relational operators instead.
Eg.
NOT dno = (SELECT dno FROM emp WHERE ename = 'TAYLOR') – Not optimized
deptno <> (SELECT deptno FROM emp WHERE ename = 'TAYLOR') – Optimized

f. If your WHERE conditions have more OR operators, then you have to rewrite the query with SET operators because SET operators execute more efficiently then OR (Relational Operators).
Eg.
SELECT * FROM emp WHERE job = 'CLERK' OR deptno = 10; – Not optimized

SELECT * FROM emp WHERE job = 'CLERK'
UNION ALL SELECT * FROM emp WHERE deptno = 10 AND job <> 'CLERK';
(Make sure you are having INDEX) – Optimized

g. Complex queries have to be changed to join.
Eg.
SELECT * FROM accounts WHERE custno
IN (SELECT custno FROM customers); – Not optimized
SELECT accounts.* FROM accounts, customers
WHERE accounts.custno = customers.custno; – Optimized
(Make sure ‘custno’ column is the primary key or indexed.)

Try to avoid IN/NOT IN and use EXISTS/NOT EXISTS
SELECT dname, dno FROM dept WHERE dno NOT IN (SELECT dno from emp); - Not Optimized
SELECT dname,dno FROM dept WHERE dno NOT EXISTS (SELECT dno FROM emp WHERE dept.dno=emp.dno); - Optimized
(Make sure the dno on emp is Indexed.)

Minimize the Distinct using NOT IN instead of IN.

Use INSERT/DELETE/UPDATE RETURNING in PL/sql to reduce the no. of database calls.

Create view to reduce the cost of QUERY PARSING for complex queries. But don’t recycle views ie selecting value from view unnecessarily

Outer join is always problematic. So don’t put outer join query on join views, put directly on to the tables.

Using Efficient Non-index WHERE clause sequencing.

Oracle evaluates un-indexed equations, linked by the AND verb in a bottom-up fashion. This means that the first clause (last in the AND list) is evaluated, and if it is found true, the second clause is then tested. Always try to position the most expensive clause first in the WHERE clause sequencing.

Example:

SELECT …………… Total CPU time: 156.3 seconds
FROM emp E
WHERE emp_salary > 50000
AND emp_type = ‘MANAGER’
AND 25 < ( SELECT COUNT(*)
FROM emp
WHERE emp_mgr = E.emp_no )

Simply alter the order of the AND clauses:

SELECT …………… Total CPU time: 10.6 seconds
FROM emp E
WHERE 25 < ( SELECT COUNT(*)
FROM emp
WHERE emp_mgr = E.emp_no )
AND emp_salary > 50000
AND emp_type = ‘MANAGER’

Oracle evaluates un-indexed equations, linked by the OR verb in a top-down fashion. This means that the first clause (first in the OR list) is evaluated, and if it is found false, the second clause is then tested. Always try to position the most expensive OR clause last in the WHERE clause sequencing.

Example:

SELECT ……………………… Total CPU time: 28.3 seconds
FROM emp E
WHERE (emp_salary > 50000
AND emp_type = ‘MANAGER’)
OR 25 < ( SELECT COUNT(*)
FROM emp
WHERE emp_mgr = E.emp_no )







Simply alter the order of the OR clauses:

SELECT …………………… Total CPU time: 101.6 seconds
FROM emp E
WHERE 25 < ( SELECT COUNT(*)
FROM emp
WHERE emp_mgr = E.emp_no)
OR ( emp_salary > 50000
AND emp_type = ‘MANAGER’ )

Using ROWID When Possible

The ROWID of a record is the fastest method of record retrieval. The performance can be improved by selecting a record before updating or deleting it and including ROWID in the initial selection list.



Example:
SELECT ROWID, …………………………
INTO :emp_rowid, ……………………..
FROM emp
WHERE emp.emp_no = 56722
FOR UPDATE;

UPDATE emp
SET emp.name = ……………………………
WHERE ROWID = :emp_rowid;

Reducing the Number of Trips to the Database

Every time a SQL statement is executed, Oracle needs to perform many internal processing steps; the statement needs to be parsed, indexes evaluated, variables bound and the data block read. Reducing the physical number of trips to the database is particularly beneficial in Client-Server configurations in which the database may need to be accessed over a network.

The following examples show three distinct ways of retrieving data about employees who have employee numbers 0342 or 0291.

METHOD 1 shows two separate database accesses :

SELECT emp_name, salary, grade
FROM emp
WHERE emp_no = 0342;

SELECT emp_name, salary, grade
FROM emp
WHERE emp_no =

No comments: