Oracle Clusters – for the new oracle dba - A Fun Blog From Sridhar Kadiyala

Learn all about Oracle Clusters – for the new oracle dba

First, you must understand what a cluster is in Oracle.

A cluster is simply a method for storing more then 1 table on the same block.  Normally

-- a block contains data for exactly 1 table. In a cluster -- you have data from many

tables sharing the same block.

For example -- if you join the data from EMP and DEPT by DEPTNO frequently -- you might

consider clustering this data by DEPTNO.  In that fashion all of the rows from EMP for

deptno=10 and the row in DEPT for deptno=10 will reside on the same exact block (one IO

to get all of the data instead of some IO's to EMP and DEPT).

In order to organize data in such an object, we create a cluster KEY -- deptno in our

above example.  We must index this cluster key.  This index on deptno in the cluster is

called a CLUSTER INDEX and is used to locate the blocks that contain data about deptno=10

(for example).

CREATE CLUSTER kadiyal.sri_emp_dept (deptno NUMBER(3))

SIZE 8192

TABLESPACE ts_users

STORAGE (INITIAL 200K

NEXT 300K

MINEXTENTS 2

MAXEXTENTS 20

PCTINCREASE 33);

select * from dba_indexes where index_type = ‘CLUSTER’ and owner = ‘UD’

select * from dba_segments where segment_name = ‘IC_UDCL’

index is 550 MB in size ..

CREATE INDEX kadiyal.sri_emp_dept_index

ON CLUSTER sri_emp_dept

TABLESPACE ts_users

STORAGE (INITIAL 50K

NEXT 50K

MINEXTENTS 2

MAXEXTENTS 10

PCTINCREASE 33);

CREATE TABLE kadiyal.dept (

deptno NUMBER(3) PRIMARY KEY,

dname varchar2(40) )

CLUSTER sri_emp_dept (deptno);

CREATE TABLE kadiyal.sri_emp (

empno NUMBER(5) PRIMARY KEY,

ename VARCHAR2(15) NOT NULL,

deptno NUMBER(3) REFERENCES dept)

CLUSTER sri_emp_dept (deptno);

insert into kadiyal.dept

select rownum,rownum||’sridhdept’

from dba_objects

where rownum < 1000

commit

insert into kadiyal.sri_emp

select rownum+100,rownum||’empl’,rownum

from dba_objects

where rownum < 1000

commit

About Clusters

A cluster provides an optional method of storing table data. A cluster is made up of a

group of tables that share the same data blocks. The tables are grouped together

because they share common columns and are often used together. For example, the

emp and dept table share the deptno column. When you cluster the emp and dept

tables (see Figure 18–1), Oracle Database physically stores all rows for each

department from both the emp and dept tables in the same data blocks.

Because clusters store related rows of different tables together in the same data blocks,

properly used clusters offer two primary benefits:

■ Disk I/O is reduced and access time improves for joins of clustered tables.

■ The cluster key is the column, or group of columns, that the clustered tables have

in common. You specify the columns of the cluster key when creating the cluster.

You subsequently specify the same columns when creating every table added to

the cluster. Each cluster key value is stored only once each in the cluster and the

cluster index, no matter how many rows of different tables contain the value.

Therefore, less storage might be required to store related table and index data in a

cluster than is necessary in non-clustered table format. For example, in

Figure 18–1, notice how each cluster key (each deptno) is stored just once for

many rows that contain the same value in both the emp and dept tables.

After creating a cluster, you can create tables in the cluster. However, before any rows

can be inserted into the clustered tables, a cluster index must be created. Using clusters

does not affect the creation of additional indexes on the clustered tables; they can be

created and dropped as usual.

You should not use clusters for tables that are frequently accessed individually.

As an add-on, remember to do the information life cycle planning bit.

Purging gobs of data from clusters is something to consider. Depending on

the size of the cluster compared to your resources, copying what you want to

keep may not be practical for a given archive and purge cycle, and you have

to consider whether to license partitioning, because deletes from a cluster

are no picnic for what the database engine has to do.

If the join conditions are different I don’t think they’re a good candiate for clustering, at least together. If you must use clusters then your two-cluster approach might work.

Clusters are an interesting idea but I’ve never seen them used for several reasons. If you need to access one table’s data without the others then you’re wasting effort reading through data you don’t want in the blocks. I think they work well for static data but am not sure about maintenance. I also am not sure about the advantage of clusters over simpler denormalized tables with respect to performance and maintenance.

I’ve made a similar statement many times viz: "If your application doesn’t use any IOTs or clusters then it has probably not been designed properly."

I think Tom’s example of using them is this: every application implicitly uses them because the data dictionary if full of them; so if Oracle think it’s the best way of handling the data dictionary, maybe there’s something in it for other people.

Admittedly, I’ve seen IOTs much more frequently than I’ve seen clusters in use; there are two problems with clusters – the volume of data per cluster key needs to be very close to uniform for them to work well, and you can’t partition them. (Footnote: I’ve asked Oracle Corp about partitioning, and there’s a "chicken and egg" situation: they would probably code for partitioning if more people used them, but people are going to use them unless they allow partitioning.)

Regards
Jonathan Lewis

user8768362 wrote:
Thanks for your reply,

– by the way: congratulations for the great book on Cost-Based fundamentals! when are part 2 and 3 coming?? –
Thanks – but I don’t know. There are increasing numbers of people who publish correct information on the internet about how the optimizer works and how to use Oracle, so the need for further volumes keeps decreasing (and the size of the task keeps increasing, and people keep hiring me to do real work instead of having fun) so volumes 2 and 3 (and 4 and 5) may never happen. I get started from time to time, then stop for several weeks, and have to start all over again.

as you wrote, in our special situation the SQL will be filtered some way. But the Java guys – they are politically stronger than me… – have modelled it a way, super-duper normal form, that you have those three tables with identical join conditions in many queries. Those queries will additional join 3 or 4 more tables and filter the whole thing by a predicate of the 6th table… But the core query, repeated every time, are those three tables, which are joined in this PK->FK manner.
This isn’t an exact match for the earlier example – it sounds like you have cascading prefixes so:
t1 (k1, k2, k3, k4, f1, f2)
t2 (k1, k2, k3, k4, k5, h1, h2) — a very small number for each row in t1
t3 (k1, k2, k3, k4, k5, k6, x1, x1) — a very small number for each row in t2.

So you could cluster on: (k1, k2, k3, k4) and aim to get all related rows for a t1 key into a single block; and if the total amount of data for each combination of (k1, k2, k3, k4) is about the same (and doesn’t extend to multiple blocks) this could work well and eliminate lots of repetitions of the 4 columns.

However, you’d still have to have the primary keys on the tables – I don’t think the cluster key can also be a primary key, but I’d have to check a recent version of Oracle. The primary keys could also be created to save space by using multi-column compression, but you’d still have to maintain them on inserts, updates and deletes. But with the right types of query the cluster could work well.

I am still curous about the type of query, though. You say you have to join six tables, then filter on the sixth. This sounds as if it’s likely to be expensive whatever you do because of the volume of data you’re going to carry before filtering. Why is this apparently the natural join order, rather than driving of the sixth back into the three ? And if driving from the sixth is the right strategy would that invalidate the use of the cluster ?

In other news — if my sketch of the tables is correct, then they could be good candidates for IOTs with multi-column compression PK.

Regards
Jonathan Lewis

select * from dba_indexes where table_name like ‘%CLT%’

select * from dba_tables where table_name like ‘%CLTC%’

select * from dba_objects where object_name like ‘%CLUSTER%’

select * from dba_clusters

select * from DBA_CLU_COLUMNS where cluster_name like ‘%CR%’

select * from dba_segments where segment_name like ‘%CLT%’

Oracle Clusters – for the new oracle dba

Related

Categories