Miscellaneous Ramblings on Hacking

October 30, 2006

Creating the Master Package

Filed under: ETL, SQL Server Integration Services, SSIS — Zack Bethem @ 9:37 pm

It’s come to that point where all of the SSIS packages for the data warehouse are complete. As other searches have pointed out, there are multiple ways to glue everything together. Coming from a DataStage or Informatica background, however, I prefer the creation of a “master package”. This package is strictly a control flow that utilizes several ‘Execute Package’ tasks.

Step 1: Define File Connection

For this implementation, all of our packages will exist as files. This requires that each package have a defined file connection within the master package. Under the Connection Manger, right click to add a new file connection for the following dialog prompt.

pngcroppercapture3.PNG

Click on Browse to add the package of choice in the following dialog.

pngcroppercapture4.PNG

After selecting the file, click on the File Name listbox. This will automatically highlight the filename. Copy the filename to the clipboard by pressing Ctrl-C. This will save time in the later steps.

Once the file connection is added, an expression will need to be defined for the ConnectionString property. In my installation, I defined a variable User::ExecDir_DWL that corresponds to the project folder location for all of my packages. This allows me to define the ConnectionString as follows (hint: use Ctrl-V to paste in the package name to save time and reduce errors). Once complete, this will ensure that the file connection is not tied to any specific location. In turn, this allows for the master package to be easily deployed to any folder structure.
pngcroppercapture5.PNG

Step 2: Add Execute Package Task

The final piece of the pie is to add the Execute Package task to the control flow. Here, I use the file name text copied to the clipboard earlier to save time.

  • First, rename the execute package task to ‘EXEC <package_name>‘.
  • Second, explicitly set the property PackageName using the expression builder. I’ve noticed that the package name isn’t set when using a package template. This should override that issue.

October 16, 2006

SSIS Script: RowCounter

Filed under: ETL, SQL Server Integration Services, SSIS — Zack Bethem @ 10:06 pm

Straight from the MSDN archives, I found this little gem to count rows within a data pipe as they are processed. The key phrase is: “as they are processed”. You’d think the Row Count transformation would work if you wanted a sequential row number. Unfortunately, this transformation only updates the assigned variable at the end of processing for the given Data Flow task.

As a lover of code snippets for reuse, here’s what you have to do:

1. Add an Output Column.  I called mine rowCount.

2. Add  the following script lines:

Public Class ScriptMain
  Inherits UserComponent
  Dim iCount As Integer = 0
  Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Row.rowCount = iCount
    iCount = iCount + 1
  End Sub
End Class

October 14, 2006

Crazy SQL Server Syntax

Filed under: Uncategorized — Zack Bethem @ 10:38 pm

Crazy, but I must admit the convenience. This came up after loading a table. I needed to generate an incremented value in the SQL Server database. Normally, you could do this by setting the column to be of type IDENTITY. I didn’t have the luxury and didn’t want to reload it anyway.

SQL Command:
declare @intCounter int
set @intCounter = 234234  — some random number

–set the column and increment
update stage_conference
set @intCounter = conference_id = @intCounter + 3 

Thanks to Google, which landed me here.

October 3, 2006

SQL Server: Unit Testing Loads

Filed under: ETL, SQL Server Integration Services — Zack Bethem @ 10:48 pm

Through the magic of the system tables in SQL Server: sysobjects, systables, and syscolumns, I’ve found a new favorite query.

Scenario

ETL packages must account for trimming extra pads placed on data fields from the source. This is usually a problem from data sourced from flat files, especially fixed-width files. Some may argue that not addressing these issues results in data quality issues. I just prefer that things are perfect for future users. (and yes, data quality is on my mind too)

Solution

The following query will dynamically generate a query for every column in a table. The each query will confirm the max length of a field versus the trimmed max length. This results in a huge time savings from manually typing everything. Enjoy!

— SQL Server: generate column trimmings SQL
select
‘select ”’+so.name+‘.’+sc.name+”’,sum(case when datalength(‘+sc.name+’) > datalength(ltrim(rtrim(‘+sc.name+‘))) then 1 else 0 end) as err_count from ‘+so.name+‘;’
from sysobjects so, syscolumns sc, systypes st
where
sc.id = so.id
and sc.type = st.type
and sc.xtype=st.xusertype
and so.name in (‘individual’)
and so.xtype=’U’
order by sc.name

** Note: The query above refers to ‘individual’ as a specific table name. You’ll want to change that based on your validation.

Update: The SQL Server function DATALENGTH() should be used instead of LEN(). LEN() will automatically rtrim a column, thus giving you a false positive. This behavior may be limited to VARCHAR columns, but be warned.

Blog at WordPress.com.