Sunday, January 20, 2008

Gather plan statistics

gather_plan_statistics

Understanding your data, business requirements and the schema is the key to tuning bad SQL statements. However, from time to time you will be faced with issues where a SQL statement’s plan that gets generated just does not seem to make logical sense. At that time, you can chose to use the 10053 trace to understand why the optimizer did what it did, you can play with the creation virtual indexes to see how that plan is changing.

One more thing that you can do is to use the /+ gather_plan_statistics */ hint. Once you have executed the query with this hint, you then need to use DBMS_XPLAN.DISPLAY_CURSOR to get the output. Usage of this hint allows us to capture extra metrics pertaining to the query. It shows E-Rows (Optimizer’s estimate number of rows) and A-Rows (Actual number of rows) for each row source. Let’s look at a sample output:

select /*+ gather_plan_statistics */ ;
select * from table(dbms_xplan.display_cursor(null, null, ‘ALLSTATS LAST’));

------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------------------
| 1 | SORT GROUP BY | | 1 | 1 | 1 |
|* 2 | FILTER | | 1 | | 1314K |
| 3 | NESTED LOOPS | | 1 | 1 | 1314K |
|* 4 | HASH JOIN | | 1 | 1 | 1314K |
|* 5 | INDEX RANGE SCAN | T2_IND_3 | 1 | 2841 | 2022 |
|* 6 | TABLE ACCESS BY LOCAL INDEX ROWID | TEST | 1 | 3879 | 4771K |
|* 7 | INDEX SKIP SCAN | TEST_IND_2 | 1 | 3567 | 4771K |
|* 8 | INDEX RANGE SCAN | T6_IND_4 | 1314K | 1 | 1314K |
------------------------------------------------------------------------------------------
If you look at line 6, you can see the big difference between the estimated and the actual row numbers. This should indicate that you will be getting a sub-optimal plan by the optimizer. First place to look for would be the statistics to see whether updated stats are available to the optimizer to come up with the right plan. Another thing to look for would be data skewness - this in combination to a bad dbms_stats statement can also lead to a bad cardinality judgement by the optimizer. Data co-relation can also lead to bad cardinality estimates (you can use extended stats in 11g in order to mitigate this and can use the cardinality hint in prior versions).

Another thing to note here is that in the case of nested loops, the value in the starts column needs to be multiplied with the E-Rows column before matching it up with the value in the A-Rows column (last line in the output above).

No comments: