How to tune oracle parallel queries ? – complete reference for the new oracle dba




you are a new oracle dba and you already know that sql queries can be parallelized. your production application has some important sql queries that are running in parallel but still the performance is not as expected. your manager asked you to investigate why the sql query is slow even though it is running in parallel.

to perform your investigation you should first understand the different terminologies used in the explain plan for a sql query that is running in parallel. The different terminologies could be producer slaves, consumer slaves , query coordinator, type of distribution like broadcast , random , parallel to serial operations, serial to parallel  operations, degree of parallelism of the tables and indexes, different parameters that can influence the parallel behaviour of a query, how the load on a system could impact the parallel behaviour of a query etc.

Below article will explain you in detail about all the above terminologies and guides you with a proper approach if you want to tune the parallel query.

Checklist for Performance Problems with Parallel Execution [ID 280939.1]  

 
  Modified 29-APR-2010     Type BULLETIN     Status PUBLISHED  
       
 
***Checked for relevance on 29-April-2010***
 
PURPOSE
-------
This note gives an overview of tuning parallel operations at
session and database level.
 

SCOPE & APPLICATION
-------------------
DBA's and Support analysts.
 
 
 
Checklist for parallel execution
---------------------------------
 
 
Tune a parallel operation at session level.
--------------------------------------------
 
To tune a statements that run in parallel, you have to understand
the basics of parallel execution.
 
We split SQL statement in smaller operations. Each of this operations
can done in parallel.
 
As example
 
 select c.unit_price
   from costs c, products p
  where c.prod_id = p.prod_id;
 
We can scan the tables (costs,products) in parallel and we can
do the join in parallel. This means we have 3 smaller operations
that run in parallel.
 
We have slaves that are grouped in slave sets. 1 slave set ( called
Producer) only works at a certain time and the other slave set
( called Consumer ) waits to get data from the first slave set.
After it got all data it starts to work, it is now the Producer
slave set.
The slaves are controlled by the user background process (called
query coordinator QC ). In RAC environment the slaves for one
query maybe be are spawned on different Nodes.
 
We can use different methods to send data from one slave set
to another. we can send all rows from one slave to all other slaves
in the other slave set. This we call a 'BROADCAST' distribution.
Another example is a join on a partitioned table. Here every slave
in the slave set that does the join works only on range of partitions.
Then you see a 'PART' distribution method.
 
A slave set can do 1 or more operation in parallel. As example
a slave set that scans the table COSTS can also do the join.
 
Here is the execution plan of the example.
 
-----------------------------------------------------------------
|Id| Operation               |Name    |  TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------
| 0| SELECT STATEMENT        |        |     |      |            |
| 1|  PX COORDINATOR         |        |     |      |            |
| 2|   PX SEND QC (RANDOM)   |:TQ10001|Q1,01| P->S | QC (RAND)  |
| 3|    HASH JOIN            |        |Q1,01| PCWP |            |
| 4|     PX RECEIVE          |        |Q1,01| PCWP |            |
| 5|      PX SEND BROADCAST  |:TQ10000|Q1,00| P->P | BROADCAST  |
| 6|       PX BLOCK ITERATOR |        |Q1,00| PCWC |            |
| 7|        TABLE ACCESS FULL|PRODUCTS|Q1,00| PCWP |            |
| 8|     PX BLOCK ITERATOR   |        |Q1,01| PCWC |            |
| 9|      TABLE ACCESS FULL  |COSTS   |Q1,01| PCWP |            |
-----------------------------------------------------------------
 
 
We scan both tables COSTS,PRODUCTS in parallel. The slave set Q1,01
that scans table COSTS does also the hash join. The PRODUCTS table
is scaned by a other slave set Q1,00 and send via a BROADCAST distribution
method to the other slave set.
 
 
 
Some operations are running in serial faster than in parallel. Please verify
if you query is faster with parallel execution.
It can happen that the degree of parallelism was changed on tables or indexes
implicit by some operations. As example a import of data or a
"ALTER INDEX REBUILD PARALLEL" can change the degree of parallelism.
 
 
 
The basic information about parallel execution you can find
in the Data Warehousing Guide.
 
See also
Note.184417.1 Where you find Information about Parallel Execution
              in the Oracle Documentation 
 
In 9.x you see the distribution method in the "PX distributed" column
when you run the utlxplp script to format the explain plan output.
From 10g onwards you will see this in the "Operation" column, so it will
also appear in the output of the utlxpls.sql script.
See Note
Note.235530.1 Recommended Method for Obtaining a Formatted Explain Plan
Note.236026.1 Display Execution Plans with dbms_xplan.display 
 
 
1.)  Check the execution plan.
 
  a.) Check if the query is well tuned.
         
 
  b.) Are steps ( e.q join, read the tables ,..) running on the same slave set?
      (Hint: You can see this in the IN-OUT column. If you have often a
              PARALLEL_COMBINED_WITH_PARENT PCWP you are on the same slave set.)
      This is good for the performance because we do not have to sent data from
      one slave set to the other slave set.
 
 
      E.g We want to join a 2 tables. For the scan of the 2 table we need 2
      slave sets. It can happened that the slave set that scanned one table
      can also do the join operation. We do not have to send the data of the
      2 tables to a new  slave set. You will see this in the IN-OUT column of
      formatted execution plan output. See also the example above.
 
 
      Features like full partition wise joins and partial partitions wise joins
      can help to achieve that we have PCWP.        
  
  c.) Check ff some parts of the operation are running in serial
      (e.g full table scan,index fast full scan )
      
      (Hint: You can see this in the IN-OUT column. If you see a  P->S or a
              S->P  you do some operations in serial.)
             
      To scan small tables in serial is maybe faster than to do it in parallel.                   
      Please check if the objects have a suitable degree setup. 
      
 
2.) Check if any data are skewed when we send them between slave sets.
 
     Run the following query after the parallel SQL statement is finished.
        
       select DFO_NUMBER, tq_id, SERVER_TYPE, NUM_ROWS, PROCESS
        from V$PQ_TQSTAT
        order by 1,2, 3;
         
    Please have in mind that you can query V$PQ_TQSTAT after the query
    is finished not during the run of the query.
   
 
 
An example.
 
The same SQL statement as before. We force only a HASH distribution.
That all three steps ( scan both tables and the join) done on
different slave sets.
The prod_id columns has only 2 values in the costs tables.
 
   PROD_ID   COUNT(*)
---------- ----------
       148        720
       147      81392
 
 
select  /*+ pq_distribute(c hash,hash) parallel(p,4) parallel(c,4)  */ 
         c.unit_price
   from  costs c, products p
   where c.prod_id = p.prod_id;
 
 
------------------------------------------------------------------
|Id| Operation               | Name    |  TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------
| 0| SELECT STATEMENT        |         |     |      |            |
| 1|  PX COORDINATOR         |         |     |      |            |
| 2|   PX SEND QC (RANDOM)   |:TQ10002 |Q1,02| P->S | QC (RAND)  |
| 3|    HASH JOIN            |         |Q1,02| PCWP |            |
| 4|     PX RECEIVE          |         |Q1,02| PCWP |            |
| 5|      PX SEND HASH       |:TQ10000 |Q1,00| P->P | HASH       |
| 6|       PX BLOCK ITERATOR |         |Q1,00| PCWC |            |
| 7|        TABLE ACCESS FULL|PRODUCTS |Q1,00| PCWP |            |
| 8|     PX RECEIVE          |         |Q1,02| PCWP |            |
| 9|      PX SEND HASH       |:TQ10001 |Q1,01| P->P | HASH       |
|10|       PX BLOCK ITERATOR |         |Q1,01| PCWC |            |
|11|        TABLE ACCESS FULL|COSTS    |Q1,01| PCWP |            |
------------------------------------------------------------------
 
 
 
 
 select  DFO_NUMBER, TQ_ID, SERVER_TYPE,  NUM_ROWS ,BYTES
      from v$pq_tqstat
     order by dfo_number , tq_id , server_type;
 
DFO_NUMBER      TQ_ID SERVER_TYP   NUM_ROWS      BYTES
---------- ---------- ---------- ---------- ----------
............
         1          2 Producer        81392 P002
         1          2 Producer            0 P003
         1          2 Producer          720 P001
         1          2 Producer            0 P000
 
You see only 2 slaves have work to do.
 
 
 
 
Tune a parallel operation at database level.
---------------------------------------------
 
It is not so easy to tune parallel execution on database level. You can
tune some parameters  like
 
 - tune PARALLEL_MAX_SERVERS 
 - tune PARALLEL_EXECUTION_MESSAGE_SIZE
 - set PARALLEL_ADAPTIVE_MULTI_USER
 
on a RAC System you can limit the number of instances with
 - PARALLEL_INSTANCE_GROUP/INSTANCE_GROUPS
 
It is important that the CPU utilization is monitored. If the CPU utilization
is always high then you can try reducing the DOP or PARALLEL_MAX_SERVERS.
If your CPU utilization is high you get no benefit from parallel execution.
In same cases the usage of parallel execution under a high CPU utilization
can hurt the performance.
 
On Platform where OSWatcher is available it should be install and started
to monitor the system. For more information please have a look into
Note 301137.1 OS Watcher User Guide
This can help to check if there was a problem with high CPU utilization. 
 
 
A good start value for parallel_max_servers is 10*(#CPU's in the system).
 
1.)  Degree of parallelism (DOP)
     Tuning the DOP is a balance between query performance and overall
     database performance. If the query runs fine standalone it maybe
     has a problem during the day when a lot of queries running in parallel
     and we have not enough slaves that the query get the request degree.
     You can be check via v$px_session for a single query or via
     v$sysstat you can monitor how many parallel operations are downgraded.
           
     The conclusion is that with a high DOP on a table/index only a few
     queries get the requested DOP, this depends also of the setting of
     PARALLEL_MAX_SERVERS. You can use Resource manager to limit
     the degree for a group of users. As example your OTLP users get a low
     DOP and your power users ( batch jobs ,..) get higher degrees.
     The parameter PARALLEL_ADAPTIVE_MULTI_USER  can also help.
 
     A high DOP and a high value for PARALLEL_MAX_SERVERS can hurt your system
     and cause a 100% CPU utilization.    
                   
     Consider the use of different number for the DOP on your tables.
     On large tables and their indexes use high degree like #CPU.
     For smaller tables use DOP (#CPU)/2 as start value.               
 
      a.) Check if you get the request degree for your query in v$px_session          
 
           select distinct qcsid, degree , req_degree from v$px_session;
 
                       QCSID     DEGREE REQ_DEGREE
                  ---------- ---------- ----------
                      141          3          4
          
          As you can see, we wanted a degree of 4 and we got a degree of 3.    
 
 
      b.) You can check in statspack if some of your queries have a
          reduced the DOP
       
          See columns
               DFO trees parallelized                           
               ...........
               Parallel operations downgraded 1                  
               Parallel operations downgraded 25                 
               Parallel operations downgraded 50                 
               Parallel operations downgraded 75                 
               Parallel operations downgraded to                 
               Parallel operations not downgrade                     
 
            or use
             select name , value
               from v$sysstat 
              where name like '%arallel%';   
        
          If a lot Parallel operations downgraded, please consider to
          reduce the DOP.
 
     
  
2.) Settings of parallel execution parameters
       
   We already  discussed PARALLEL_MAX_SERVERS and  PARALLEL_ADAPTIVE_MULTI_USER.
   PARALLEL_EXECUTION_MESSAGE_SIZE is the size of the buffer that we need
   to exchange data/messages between the slaves or between the slaves
   and the QC. High values can improve the overall performance, but we maybe
   need much more memory.
   See Note:238680.1 Parallel Execution the Large/Shared Pool and ORA-4031        
 
   You can check the setting with the commands below.
    
     a.) Use the following statements to check the setting of the important init.ora parameters.
  
        select substr(name,1,30) name , substr(value,1,15) value
          from v$parameter
         where name like '%arallel%';  
 
 
      b.) With the command
          select *  from  V$PX_PROCESS_SYSSTAT;

         we could see the high water mark of parallel execution server.
 
3.) Checkpoint tuning
 
    Parallel queries read large objects with a direct path read. Before we can do
    the read we have to do a tablespace checkpoint to write the dirty buffers
    back to the datafiles. So a checkpoint tuning and help to improve the
    performance of parallel operations.
 
    See also
    Note.237328.1 Parallel Execution and Direct Reads
 
 
Parallel Execution with RAC systems
------------------------------------
 
   PX is also automatically controlled on RAC systems.
 
 
   Parallel execution does not allocate slaves randomly across
   the available instances, but rather will start by allocating on the
   least loaded instance. The goal is to both minimize inter-node traffic
   and at the same time try to minimize any imbalance of work across
   the instances.
              
   The INSTANCE_GROUPS/PARALLEL_INSTANCE_GROUP parameter can be used to
   restrict allocation of query slaves to specific instances in a RAC
   configuration and over-ride the automatic allocation  across instances.
   This can improve the performance when there are problem with the
   inter-connect.
 
  Note: 181489.1 Tuning Inter-Instance Performance in RAC and OPS
 

 

Related


Products


  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server – Enterprise Edition
  • Oracle Database Products > Oracle Database > Oracle Database > Oracle Server – Personal Edition

Errors


ORA-4031

 

Back to top

Rate this document 

Article Rating

Rate this document
Excellent
Good
Poor
 
Did this document help you?
Yes
No
Just browsing
 
How easy was it to find this document?
Very easy
Somewhat easy
Not easy

 

  Comments

 
Cancel    

 

Author: admin