Use of auto sample size in Gather Stats of Oracle
Oracle recommend to use the AUTO_SAMPLE_SIZE during gather the table stats. Because if table is large size then 100 percent estimate will take long time for giving the 100% accurate stats value to the optimizer to generate the good execution plan for the SQL queries.
AUTO_SAMPLE_SIZE use the formula to fetch the value near by 100% result, it sometime may cause issue and you have to gather stats 100% or any other percentage.
A fixed sampling percentage size that was good at some point in time may not be appropriate after the data distribution in the table has changed.
On the other hand when AUTO value is used Oracle will adjust the sample size time to time to match according to the data present in table.
Example show the difference of time taken by different Oracle parameter. AUTO_SAMPLE_SIZE will give you best estimate and sampling size.
I have EMP table having around 38335960 records in it.
Check the time taken by the Oracle with different sampling size
SQL> set timing on
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR', tabname=> 'EMP', estimate_percent => 1);
PL/SQL procedure successfully completed.
Elapsed: 00:00:34.97
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR', tabname => 'EMP', estimate_percent => 100);
PL/SQL procedure successfully completed.
Elapsed: 00:05:38.28
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR', tabname => 'EMP', estimate_percent => dbms_stats.auto_sample_size);
PL/SQL procedure successfully completed.
Elapsed: 00:00:27.19
SQL>
On checking with the accurace provided by 100% sampling size as auto_sample_size is same in my example but execution time is very less in AUTO_SAMPLE_SIZE.
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR', tabname => 'EMP', estimate_percent => 100);
PL/SQL procedure successfully completed.
Elapsed: 00:37:21.82
SQL> Select column_name,NUM_DISTINCT,sample_size,density from DBA_TAB_COL_STATISTICS where TABLE_NAME='EMP' and COLUMN_NAME=’SALARY’;
COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE DENSITY
----------- ------------ ----------- ----------
SALARY 58 38335960 .017241379
Elapsed: 00:00:00.03
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR', tabname => 'EMP', estimate_percent => dbms_stats.auto_sample_size);
PL/SQL procedure successfully completed.
Elapsed: 00:00:26.87
SQL> Select column_name,NUM_DISTINCT,sample_size,density from DBA_TAB_COL_STATISTICS where TABLE_NAME='EMP' and COLUMN_NAME=’SALARY’;
COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE DENSITY
----------- ------------ ----------- ----------
SALARY 58 38335960 .017241379
Elapsed: 00:00:00.01
SQL> exec dbms_stats.gather_table_stats(ownname => 'HR', tabname=> 'EMP', estimate_percent => 1);
PL/SQL procedure successfully completed.
Elapsed: 00:00:34.41
SQL> Select column_name,NUM_DISTINCT,sample_size,density from DBA_TAB_COL_STATISTICS where TABLE_NAME='EMP' and COLUMN_NAME=’SALARY’;
COLUMN_NAME NUM_DISTINCT SAMPLE_SIZE DENSITY
----------- ------------ ----------- ----------
SALARY 58 383073 .017241379