One of the first ETL tasks that are commonly required is building a time or calendar dimension. The ETL is used on an ad-hoc basis for populating the target dimension. In this example, I’ll show how a single transformer is used to generate rows of day representing a single day. These resulting rows will then be inserted into the target.
create table calendar (
Now that the table is setup, it’s time to startup DataStage and create a new job. The job should be setup with the following objects as shown.
As you can see, there is a post update to the calendar dimension. The update is responsible for setting the values in the column: mo_work_day_remaining, which represents the number of workdays remaining in the month.
To ensure that the job is dynamic and can be re-used in the future, the job is defined with the following stage variables. For this job, our variable are:
StartDate – defines the first day to generate records
EndDate – sets the final day to generate records
The meat of the job is in the first transformer. It is in this transformer that records are generated for each interval (variable: Interval) from the first day (job variable: StartDate) to the last day (job variable: EndDate).
For this transformer, we’ve setup the following variables:
These variables are later defined as stage variables within the X_GenerateCalendar transformer.
The stage variables are now setup for the initial transformer. You’ll notice that we’ve used extensive use of the ICONV and OCONV functions within DataStage. These functions translate the dates into internal and external formats, respectively. By using these functions, we’re allowed to use the available D-Code values based on the internal format. For example, the D-Code “DM” for the numerical month and “DY” for the four-digit year.
The transformer output link is setup as follows:
Note the defined constraint for the output link. We ensure that the variables are defined before sending the records by the constraint: @OUTROWNUM > 0. We also ensure that we continue to generate rows until the StartingDate is equal to the EndingDate.
That’s it. Through the use of a single, initial transformer we were able to generate calendar rows with a date range based on the input parameters: start date and end date. Starting with the initial date provided, we generated several columns (mo, yr, qtr) that will be useful for dissecting and aggregating our fact tables are varying levels of granularity.
Hope it helps. I’ve skipped the remainder transformation for this particular job, as that is simply controlled by standard Output SQL from Tgt_Calendar.