What are Slowly Changing Dimensions?
Below is a list of the Slowly Changing Dimensions in use today:
- Type 0 - The passive method
- Type 1 - Overwrites the old value, no changes are tracked
- Type 2 - Creates a new additional record to track changes
- Type 3 - Adds a new column to track changes
- Type 4 - Uses a separate historical table to track changes
- Type 6 - Combines Type 1,2,3 approaches (1+2+3=6)
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
In this example, in the Customer Dimension, we have the following record:
After Peter moved from Washington to Maine, the new information replaces the original. In this case, Washington is replaced with Maine, and after the data ingestion Customer Dimension looks like this:
Advantages:This approach is easy to implement, since there is no need to keep track of the old information.
Disadvantages:All history is lost. By applying this approach, it is not possible to trace back in history. For example, in this case, you do not have the ability to know that Peter lived in Washington before.
Usage:About 50% of the time.
When to use Type 1:Type 1 slowly changing dimension should be used when it is not necessary to keep track of historical changes in your Data Warehouse.
How to Ingest Data using SCD Type 1 Framework
- Download the Data Ingestion Framework for SCD Type 1.
- Follow the instructions to copy the scripts to your Hadoop Big Data environment.
- Change Directory to the location where you have copied the scripts
- Use the sample command line below to ingest data into Hadoop.
./Type1.sh Source_SQL="Source Data as SQL statement" Target_Table="Table where data should be loaded" Target_Columns="Target Columns to load data" Mapped_Columns="Source to Target Column Map" Data_Base_Name="Hadoop Database Name to load data" SQL_File_Location="SQL File Location" Log_File="Log File Name to log the script execution details"
|1||Target_Table||Table where data should be loaded|
|2||Join_Columns||It joins the source and target columns.|
|3||Mapped_Columns||source and target columns are mapped to load the data.|
|4||Source_SQL||Enter Source SQL query, which has new and updated records.|
|5||Target_Columns||It loads the comma separated list in target column.|
|6||Data_Base_Name||It names the Hadoop database where the target table is located.|
|7||SQL_File_Location||Enter the path from where the Type_1.sql file is downloaded.|
|8||Log_File||It shows the script execution details.|