It’s come to that point where all of the SSIS packages for the data warehouse are complete. As other searches have pointed out, there are multiple ways to glue everything together. Coming from a DataStage or Informatica background, however, I prefer the creation of a “master package”. This package is strictly a control flow that utilizes several ‘Execute Package’ tasks.
Step 1: Define File Connection
For this implementation, all of our packages will exist as files. This requires that each package have a defined file connection within the master package. Under the Connection Manger, right click to add a new file connection for the following dialog prompt.
Click on Browse to add the package of choice in the following dialog.
After selecting the file, click on the File Name listbox. This will automatically highlight the filename. Copy the filename to the clipboard by pressing Ctrl-C. This will save time in the later steps.
Once the file connection is added, an expression will need to be defined for the ConnectionString property. In my installation, I defined a variable User::ExecDir_DWL that corresponds to the project folder location for all of my packages. This allows me to define the ConnectionString as follows (hint: use Ctrl-V to paste in the package name to save time and reduce errors). Once complete, this will ensure that the file connection is not tied to any specific location. In turn, this allows for the master package to be easily deployed to any folder structure.
Step 2: Add Execute Package Task
The final piece of the pie is to add the Execute Package task to the control flow. Here, I use the file name text copied to the clipboard earlier to save time.
- First, rename the execute package task to ‘EXEC <package_name>‘.
- Second, explicitly set the property PackageName using the expression builder. I’ve noticed that the package name isn’t set when using a package template. This should override that issue.
Straight from the MSDN archives, I found this little gem to count rows within a data pipe as they are processed. The key phrase is: “as they are processed”. You’d think the Row Count transformation would work if you wanted a sequential row number. Unfortunately, this transformation only updates the assigned variable at the end of processing for the given Data Flow task.
As a lover of code snippets for reuse, here’s what you have to do:
1. Add an Output Column. I called mine rowCount.
2. Add the following script lines:
Public Class ScriptMain
Dim iCount As Integer = 0
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.rowCount = iCount
iCount = iCount + 1
Crazy, but I must admit the convenience. This came up after loading a table. I needed to generate an incremented value in the SQL Server database. Normally, you could do this by setting the column to be of type IDENTITY. I didn’t have the luxury and didn’t want to reload it anyway.
declare @intCounter int
set @intCounter = 234234 — some random number
–set the column and increment
set @intCounter = conference_id = @intCounter + 3
Thanks to Google, which landed me here.
Through the magic of the system tables in SQL Server: sysobjects, systables, and syscolumns, I’ve found a new favorite query.
ETL packages must account for trimming extra pads placed on data fields from the source. This is usually a problem from data sourced from flat files, especially fixed-width files. Some may argue that not addressing these issues results in data quality issues. I just prefer that things are perfect for future users. (and yes, data quality is on my mind too)
The following query will dynamically generate a query for every column in a table. The each query will confirm the max length of a field versus the trimmed max length. This results in a huge time savings from manually typing everything. Enjoy!
— SQL Server: generate column trimmings SQL
‘select ”’+so.name+‘.’+sc.name+”’,sum(case when datalength(‘+sc.name+’) > datalength(ltrim(rtrim(‘+sc.name+‘))) then 1 else 0 end) as err_count from ‘+so.name+‘;’
from sysobjects so, syscolumns sc, systypes st
sc.id = so.id
and sc.type = st.type
and so.name in (‘individual’)
order by sc.name
** Note: The query above refers to ‘individual’ as a specific table name. You’ll want to change that based on your validation.
Update: The SQL Server function DATALENGTH() should be used instead of LEN(). LEN() will automatically rtrim a column, thus giving you a false positive. This behavior may be limited to VARCHAR columns, but be warned.