Miscellaneous Ramblings on Hacking

September 16, 2006

Building a Calendar Dimension in DataStage

Filed under: DataStage, ETL — Zack Bethem @ 11:55 pm

Inaugural post!

One of the first ETL tasks that are commonly required is building a time or calendar dimension. The ETL is used on an ad-hoc basis for populating the target dimension. In this example, I’ll show how a single transformer is used to generate rows of day representing a single day. These resulting rows will then be inserted into the target.

Target definition:
create table calendar (
mo_date datetime,
mo_start_date,
mo_work_day,
mo_work_day_remaining,
mo,
qtr,
yr,
fiscal_mo,
fiscal_qtr,
fiscal_yr);

Now that the table is setup, it’s time to startup DataStage and create a new job. The job should be setup with the following objects as shown.

As you can see, there is a post update to the calendar dimension. The update is responsible for setting the values in the column: mo_work_day_remaining, which represents the number of workdays remaining in the month.

Variables

To ensure that the job is dynamic and can be re-used in the future, the job is defined with the following stage variables. For this job, our variable are:
StartDate – defines the first day to generate records
EndDate – sets the final day to generate records

Transformer: X_GenerateCalendar

The meat of the job is in the first transformer. It is in this transformer that records are generated for each interval (variable: Interval) from the first day (job variable: StartDate) to the last day (job variable: EndDate).

For this transformer, we’ve setup the following variables:

These variables are later defined as stage variables within the X_GenerateCalendar transformer.

The stage variables are now setup for the initial transformer. You’ll notice that we’ve used extensive use of the ICONV and OCONV functions within DataStage. These functions translate the dates into internal and external formats, respectively. By using these functions, we’re allowed to use the available D-Code values based on the internal format. For example, the D-Code “DM” for the numerical month and “DY[4]” for the four-digit year.

The transformer output link is setup as follows:

Note the defined constraint for the output link. We ensure that the variables are defined before sending the records by the constraint: @OUTROWNUM > 0. We also ensure that we continue to generate rows until the StartingDate is equal to the EndingDate.

Summary

That’s it. Through the use of a single, initial transformer we were able to generate calendar rows with a date range based on the input parameters: start date and end date. Starting with the initial date provided, we generated several columns (mo, yr, qtr) that will be useful for dissecting and aggregating our fact tables are varying levels of granularity.

Hope it helps. I’ve skipped the remainder transformation for this particular job, as that is simply controlled by standard Output SQL from Tgt_Calendar.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: