Summary: Reference Tables in a Datawarehouse
Reference Tables in a Datawarehouse are very important. These are the tables which have at a minimum – 2 columns. CODE_TYPE and CODE_DESCRIPTION. The business codes and their descriptions are stored in these tables. There is normally more than 1 reference table in a datawarehouse.These CODE_TYPE values are then used in the child tables. The normal practice is to ensure that only those codes that exist in the reference tables are used in the child tables. Child tables are other tables like Fact tables and other Dimension tables or any other tables which hold the core datawarehouse business data.
Should we Use Foreign keys between Reference Tables and the Child Tables?
If the Reference Table supports Historization then it will be difficult to implement foreign key relationships.Historization means the valid_from and valid_to columns present in the Reference Table.So you could have values like
CODE_TYPE CODE_DESCRIPTION VALID_FROM VALID_TO
EUR Euro Currency Description old 01-01-2015 31-03-2015
EUR Euro Currency Description new 01-04-2015 31-12-9999
So we see that the CODE_TYPE column cannot be the primary key since the EUR value is repeated more than once .Hence it cannot be a foreign key in the child table. ETL Tools like Informatica hence use logic to use the reference table as a lookup table to ensure the referential integrity is implemented through the informatica workflows and not through foreign key relationships.
There is an Alternative solution to use numerical surrogate keys . In this case however we need to consider additional logic to apply on the valid_from and valid_to dates to ensure we get the correct code descriptions.
S.Key CODE_TYPE CODE_DESCRIPTION VALID_FROM VALID_TO
1 EUR Euro Currency Description old 01-01-2015 31-03-2015
2 EUR Euro Currency Description new 01-04-2015 31-12-9999
Child Table
PRODUCT_NO PRODUCT_QUANTITY PRODUCT_PRICE CURRENCY_CODE VALID_FROM VALID_TO
P100 5 23.50 1 01-01-2015 31-12-9999
We see that the product table uses the currency_code value 1 . This points to the old currency description.This column can be used as the foreign key since the values are incremented sequentially by using an oracle sequence for example. However when we run the report on April 9th we should be able to see the new currency description.So our query needs additional logic to consider the date when the report was run and then displaying the description that is valid for that date.We can alternatively have a product table based on snapshot_date where every day the table is inserted with new rows and will contain values valid for that particular day.The reporting tools would then read this new product table so the valid currency descriptions are already filled in.
In our project the decision not to implement foreign keys at the database level was taken. This is because of the presence of historization columns and the complexity arising from the surrogate keys mentioned in the above option.The referential integrity in the datawarehouse would then be handled by the Informatica ETL Flows.