Recently i was checking whether the performance of our parallel queries on our datawarehouse environment is satisfactory or not. Below is the list of steps and the output that helped in determining whether the performance is ok or not.
if you are a new oracle dba and if you want to check the performance of the parallel queries in your database then you can follow the below procedure.
1)Oracle recommendation for datawarehouses–Consider the use of different number for the DOP on your tables. On large tables and their indexes use high degree like #CPU. For smaller tables use DOP (#CPU)/2 as start value.
select table_name, degree, instances
from dba_tables
where trim(degree) != ‘1’or ( trim(instances) != ‘1’ and trim(instances) != ‘0’ );
select index_name, degree, instances
from dba_indexes
where trim(degree) != ‘1’ or ( trim(instances) != ‘1’ and trim(instances) != ‘0’ );
**during our application processing we specify the degree of parallelism in all our queries so this point is ok . Some objects have DOP 6 or 8 which is fine.
2)Find if the system is providing you the requested degree of parallel slaves
select distinct qcsid, degree , req_degree from v$px_session;
**returned info that we requested currently 4 and the system accepted the 4 .so no need to worry.
3)Find out if any parallel operations have been downgraded
select name , value
from v$sysstat
where name like ‘%arallel%’;
Parallel operations not downgraded 20300
Parallel operations downgraded to serial 8
**from the last database restart , we executed 20,308 queries and only 8 of then were downgraded from parallel to serial due to not enough resources.no need to worry.
4) parallel_min_servers default value 0 on UAT
In this case once the query execution is finished oracle will wait 5 minutes and then will close the parallel slave. As our batch processing has always 5 tasks in parallel and quite resource intensive I guess we don’t have an idle time of 5 minutes so should be ok.
5)The size of the buffer needed for communication between the slaves/coordinator is present in the parameter PARALLEL_EXECUTION_MESSAGE_SIZE which is 2192 bytes on UAT .
6)Find out the buffer High water mark by using below query .currently it is 2 MB .so its fine our sga on uat can handle this.
SELECT statistic,value,value*2192/1024/1024 as MB FROM V$PX_PROCESS_SYSSTAT WHERE STATISTIC LIKE ‘Buffers%’;
The amount of memory used appears in the Buffers Current and Buffers HWM
statistics. Calculate a value in bytes by multiplying the number of buffers
by the value for PARALLEL_EXECUTION_MESSAGE_SIZE.
The formula is not 100% correct. To avoid fragmentation of the shared/large pool
we allocate in large chunks. This means the value of ‘PX msg pool’ can be larger
than ‘Buffers HWM’ * PARALLEL_EXECUTION_MESSAGE_SIZE.
7)checking pga advisory in the awr report shows that increase to the pga_aggregate_target parameter by 160 MB decreases the amount of work done on the disk by 50%. So the value has been increased from 800 to 960Mb on UAT after verifying that the server can cope with this value.
8)To find out information about the behaviour of your session as soon as you have finished executing a query using parallel slaves.
select DFO_NUMBER, tq_id, SERVER_TYPE, NUM_ROWS, PROCESS
from V$PQ_TQSTAT
order by 1,2, 3;
9)where does the memory for holding the buffers used for communication between the slaves/coordinator come from?
**In10g the PARALLEL_AUTOMATIC_TUNING parameter has been deprecated. Current value on UAT for this parameter is FALSE. And theoritically according to below oracle documentation
the allocation has to come from shared pool. But practically by using the below query on UAT we can see that infact it is allocated from the large pool so could be a mistake in the documentation in metalink.
Parallel Execution slave have to exchange data and messages that they can work together.
For this we have to allocate memory from the shared pool or large pool.
This decision depends to which value PARALLEL_AUTOMATIC_TUNING is set.
If PARALLEL_AUTOMATIC_TUNING = TRUE than we take memory from the large pool
otherwise we use the shared pool.
At startup of the database we allocate some memory for the "PX msg pool"
to avoid fragmentation and get faster memory for the PX buffers,
when PARALLEL_AUTOMATIC_TUNING or PARALLEL_MIN_SERVERS is set.
You can monitor from which area we allocate memory for PX msg buffer when
we query V$SGASTAT and look for the "PX msg pool"
The PX message buffer can become very large. Another area in the shared pool where
we allocate memory for PX operation is the "PX subheap". This heap is
small when we compare it with the "PX msg pool" and always in the shared pool.
SELECT POOL, NAME, SUM(BYTES)/1024/1024 FROM V$SGASTAT WHERE POOL LIKE ‘%pool%’ and name like ‘%PX%’
GROUP BY ROLLUP (POOL, NAME);
POOL|NAME|SUM(BYTES)/1024/1024
large pool|PX msg pool|1.02622222900391
large pool||1.02622222900391
shared pool|PX subheap|0.420280456542969
10)To find out how much memory is required in the shared pool/largepool for helping your parallel slaves at a particular time
3 x size x users x groups x connections
on UAT did select 3*2152*5*2*34 from dual; gives me 2 MB and our shared pool can handle this prooved by the awr shared pool advisory.
The terms are:
SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE
USERS = the number of concurrent parallel execution users that you expect to
have running with the optimal DOP
GROUPS = the number of query server process groups used for each query
A simple SQL statement requires only one group. However, if your queries
involve subqueries which will be processed in parallel, then Oracle uses an
additional group of query server processes.
CONNECTIONS = (DOP2 + 2 x DOP)
If your system is a cluster or MPP, then you should account for the number of
instances because this will increase the DOP. In other words, using a DOP of 4 on
a two instance cluster results in a DOP of 8. A value of PARALLEL_MAX_SERVERS
times the number of instances divided by four is a conservative estimate to use as
a starting point.