checking performance of your parallel queries on your datawarehouse – Brief Case study for the new oracle dba




Recently i was checking whether the performance of our parallel queries on our datawarehouse environment is satisfactory or not. Below is the list of steps and the output that helped in determining whether the performance is ok or not.

if you are a new oracle dba and if you want to check the performance of the parallel queries in your database then you can follow the below procedure.

1)Oracle recommendation for datawarehouses–Consider the use of different number for the DOP on your tables. On large tables and their indexes use high degree like #CPU. For smaller tables use DOP (#CPU)/2 as start value.

                                   select table_name, degree, instances

        from dba_tables

        where trim(degree) != ‘1’or ( trim(instances) != ‘1’ and trim(instances) != ‘0’ );

        select index_name, degree, instances

        from dba_indexes

        where trim(degree) != ‘1’ or ( trim(instances) != ‘1’ and trim(instances) != ‘0’ );

      **during our application processing we specify the degree of parallelism in all our queries so this point is ok . Some objects have DOP 6 or 8 which is fine.

 

2)Find if the system is providing you the requested degree of parallel slaves

               select distinct qcsid, degree , req_degree from v$px_session;

    **returned info that we requested currently 4 and the system accepted the 4 .so no need to worry.

 

3)Find out if any parallel operations have been downgraded

                   select name , value

               from v$sysstat 

              where name like ‘%arallel%’; 

      Parallel operations not downgraded  20300

      Parallel operations downgraded to serial 8

     **from the last database restart ,  we executed 20,308 queries and only 8 of then were downgraded from parallel to serial due to not enough resources.no need to worry.

 

4) parallel_min_servers default value 0 on UAT

    In this case once the query execution is finished oracle will wait 5 minutes and then will close the parallel slave. As our batch processing has always 5 tasks in parallel and quite resource intensive I guess we don’t have an idle time of 5 minutes so should be ok.

 

5)The size of the buffer needed for communication between the slaves/coordinator is present in the parameter PARALLEL_EXECUTION_MESSAGE_SIZE  which is 2192 bytes on UAT .

6)Find out the buffer High water mark by using below query .currently it is 2 MB .so its fine our sga on uat can handle this.

   SELECT statistic,value,value*2192/1024/1024 as MB FROM V$PX_PROCESS_SYSSTAT WHERE STATISTIC LIKE ‘Buffers%’;

                           The amount of memory used appears in the Buffers Current and Buffers HWM

statistics. Calculate a value in bytes by multiplying the number of buffers

by the value for PARALLEL_EXECUTION_MESSAGE_SIZE.

The formula is not 100% correct. To avoid fragmentation of the shared/large pool

we allocate in large chunks. This means the value of ‘PX msg pool’ can be larger

than ‘Buffers HWM’ * PARALLEL_EXECUTION_MESSAGE_SIZE.  

7)checking pga advisory in the awr report shows that increase to the pga_aggregate_target parameter by 160 MB decreases  the amount of work done on the disk by 50%. So the value has been increased from 800 to 960Mb on UAT after verifying that the server can cope with this   value.

8)To find out information about the behaviour of your session as soon as you have finished executing a query using parallel slaves.

                    select DFO_NUMBER, tq_id, SERVER_TYPE, NUM_ROWS, PROCESS

        from V$PQ_TQSTAT

        order by 1,2, 3;

9)where does the memory for holding the buffers used for communication between the slaves/coordinator come from?

   **In10g the PARALLEL_AUTOMATIC_TUNING parameter has been deprecated. Current value on UAT for this parameter is FALSE. And theoritically according to below oracle documentation

              the allocation has to come from shared pool. But practically by using the below query on UAT we can see that infact it is allocated from the large pool so could be a mistake in the documentation in metalink.

      Parallel Execution slave have to exchange data and messages that they can work together.

For this we have to allocate memory from the shared pool or large pool.

This decision depends to which value PARALLEL_AUTOMATIC_TUNING is set.

If  PARALLEL_AUTOMATIC_TUNING = TRUE than we take memory from the large pool

otherwise we use the shared pool.

At startup of the database we allocate some memory for the "PX msg pool"

to avoid fragmentation and get faster memory for the PX buffers,

when PARALLEL_AUTOMATIC_TUNING or PARALLEL_MIN_SERVERS is set.

You can monitor from which area we allocate memory for PX msg buffer when

we query V$SGASTAT and look  for the "PX msg pool"

The PX message buffer can become very large. Another area in the shared pool where

we allocate memory for PX operation is the "PX subheap". This heap is

small when we compare it with the "PX msg pool" and always in the shared pool.

      SELECT POOL, NAME, SUM(BYTES)/1024/1024 FROM V$SGASTAT WHERE POOL LIKE ‘%pool%’  and name like ‘%PX%’

      GROUP BY ROLLUP (POOL, NAME);

        POOL|NAME|SUM(BYTES)/1024/1024

large pool|PX msg pool|1.02622222900391

large pool||1.02622222900391

shared pool|PX subheap|0.420280456542969

10)To find out how much memory is required in the shared pool/largepool for helping your parallel slaves at a particular time

           3 x size x users x groups x connections

    on UAT did    select 3*2152*5*2*34  from dual;  gives me 2 MB and our shared pool can handle this prooved by the awr shared pool advisory.

      The terms are:

 SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE

USERS = the number of concurrent parallel execution users that you expect to

have running with the optimal DOP

 GROUPS = the number of query server process groups used for each query

A simple SQL statement requires only one group. However, if your queries

involve subqueries which will be processed in parallel, then Oracle uses an

additional group of query server processes.

 CONNECTIONS = (DOP2 + 2 x DOP)

If your system is a cluster or MPP, then you should account for the number of

instances because this will increase the DOP. In other words, using a DOP of 4 on

a two instance cluster results in a DOP of 8. A value of PARALLEL_MAX_SERVERS

times the number of instances divided by four is a conservative estimate to use as

a starting point.

Author: admin