Datawarehousing – Things to Consider – Reference Tables

Summary: Reference Tables in a Datawarehouse

Reference Tables in a Datawarehouse  are very important. These are the tables which have at a minimum – 2 columns. CODE_TYPE and CODE_DESCRIPTION. The business codes and their descriptions are stored in these tables. There is normally more than 1 reference table in a datawarehouse.These CODE_TYPE values  are then used in the  child tables. The normal practice is to ensure that only those codes that exist in the reference tables are  used in the child tables. Child tables are other tables like Fact tables and other Dimension tables or any other tables which hold the core datawarehouse business data.

Should we Use Foreign keys between Reference Tables and the Child Tables?





If the Reference Table supports Historization then it will be difficult to implement foreign key relationships.Historization means the valid_from and valid_to columns present in the Reference Table.So you could have values like

CODE_TYPE    CODE_DESCRIPTION                    VALID_FROM   VALID_TO

EUR                     Euro Currency Description old      01-01-2015             31-03-2015

EUR                     Euro Currency Description new     01-04-2015            31-12-9999

So we see that the CODE_TYPE column cannot be the primary key since  the EUR value is repeated more than once .Hence it cannot be a foreign key in the child table. ETL Tools like Informatica hence use logic to use the reference table as a lookup table to ensure the referential integrity is implemented through the informatica workflows and not through foreign key relationships.

There is an Alternative solution to use numerical surrogate keys . In this case however we need to consider additional logic to apply on the valid_from and valid_to dates to ensure we get the correct code descriptions.

S.Key       CODE_TYPE    CODE_DESCRIPTION                    VALID_FROM   VALID_TO

1                 EUR                     Euro Currency Description old      01-01-2015             31-03-2015

2                 EUR                     Euro Currency Description new     01-04-2015            31-12-9999

 

Child Table

PRODUCT_NO   PRODUCT_QUANTITY PRODUCT_PRICE CURRENCY_CODE  VALID_FROM     VALID_TO

P100                               5                                       23.50                                                1                     01-01-2015        31-12-9999          

 

We see that the product table uses the  currency_code value 1 . This points to the old currency description.This column can be used as the foreign key since the values are incremented sequentially by using an oracle sequence for example. However when we run the report on April 9th  we should be able to see the new currency description.So our query needs additional logic to consider the date when the report was run and then displaying the description that is valid for that date.We can alternatively have a product table based on snapshot_date where every day the table is inserted with new rows and will contain values valid for that particular day.The reporting tools would then read this new product table so the valid currency descriptions are already filled in.

In our project the decision not to implement foreign keys at the database level was taken. This is because of the presence of historization columns and the complexity arising from the surrogate keys mentioned in the above option.The referential integrity in the datawarehouse would then be handled by the Informatica ETL Flows.

 

 

 

Author: admin