Miscellaneous Ramblings on Hacking

November 29, 2006

Variables in Source SQL

Filed under: ETL, SQL Server Integration Services, SSIS — Zack Bethem @ 11:09 pm

When dealing with source database systems, it’s common to only want to process the “delta” of changed or new records. These systems typically have a created and last_updated datestamp on each record for audit purposes. This entry will detail how one can do this in SSIS.

Last Run Date

In my scenario, I opted to store a transaction log table. This table stores the package execution history. This table is then queried in a SQL Task in the control flow. The SQL Task is set to retrieve only one row and store the last_load_date into a variable. This variable will then be used later in the where clause of my source SQL in later data flows.

The creation script for the table is below.

CREATE TABLE dbo.load_history (
mapping_name nvarchar (50) NULL ,
last_load_date smalldatetime NULL ,
start_date smalldatetime NULL ,
end_date datetime NULL ,
status nvarchar (20) NULL
)

The setup of the SQL Task to return the last_load_date into a variable is done as follows:

  1. Defined a variable LastRunDate with a data type of DateTime.
  2. Under the General tab, set the Result Set to ‘Single Row’. The SQLStatement is set to:
    select last_load_date from load_history
    where mapping_name = <mapping_name>
    and end_date is null
    and status = ‘Pending’
  3. croppercapture26.Png

  4. Under the Result Set tab, set the Result Name: ‘last_load_date’ to the defined variable. In my case, I named my variable: LastRunDate.croppercapture27.Png
The Hidden Magic of Variable Properties

Next, is to define a variable for the source sql.

  1. Add a variable of data type string. In my case, I named the variable: SQL
  2. With the variable selected in the variable window, hit F4 to open the variable’s properties
  3. Set the property ‘EvaluateAsExpression’ to true
  4. Set the property expression using the variable defined earlier. Don’t forget that the expression is a string, so it must be wrapped in double quotes. For example:“select col_1, col_2, … from table_1
    where
    last_update_date >= ‘” + @[User::LastRunDate] + “‘
    or created_date >= ‘” + @[User::LastRunDate] + “‘;”

    croppercapture28.Png

November 14, 2006

Source File Archiving: SSIS versus DataStage

Filed under: DataStage, ETL, SQL Server Integration Services, SSIS — Zack Bethem @ 4:42 pm

I thought it would be interesting to compare and contrast the differences between SSIS and IBM DataStage. It is easy for some to be married to one particular technology, so I thought that this would be interesting for some.

In this example, we’re in the situation of working with source files. After a load, I believe it’s best practice to archive the files that you’ve loaded for later reference. These files can then be absorbed by the corporate archive strategy all ready in place. The goal of the process is as follows:

  1. Open File
  2. Process data in file and load to target
  3. Archive File

SSIS File Archive

In SSIS, steps 1 and 2 are handled by the Data Flow of the package. Step 3 is executed within the Control Flow. There are two methods available for archiving:

  1. Utilize two File System tasks in sequential order. The first to move the file to the desired directory. The second, rename the file with the appropriate date stamp. The downside, each File System task requires that a destination and source connection manager be defined.
  2. Utilize a Script Task that executes code to move and rename a file. This has lots more control and can do the move and rename in one step. The downside, code reuse will require one to either create their own transformer or copy the script into to each package.

Using this script as a reference, I went the scripting route as it seemed cleaner. The script was defined with the following variables:

User::FileName – Data file name, eg. loandata.txt
User::FileDir_DATA – Data directory on the staging server, eg. ‘c:\data\sources\’
User::FileDir_TYPE – Data sub-directory, eg. ‘loandata\’

Here’s the code:

Imports System Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO

Public Class ScriptMain

‘Reference: http://blogs.conchango.com/jamiethomson/archive/2005/09/14/2149.aspx

Public Sub Main()

‘check if vars exist
If Dts.Variables.Contains(“User::FileName”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_DATA”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_TYPE”) = True
Then

Dim srcFile, tgtFile, tgtFileName, searchStr As String

searchStr = “.”
srcFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = Left$(tgtFileName, InStrRev(tgtFileName, “.”) – 1) + _
“_” + CStr(Format(Now, “yyyyMMddHH”)) + _
Mid(tgtFileName, InStrRev(tgtFileName, “.”))

‘System.Windows.Forms.MessageBox.Show(tgtFileName)

tgtFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
“Archive\” + tgtFileName
‘System.Windows.Forms.MessageBox.Show(tgtFile)

Try
File.Move(srcFile, tgtFile)
Dts.Events.FireInformation(0, “”, “File moved to: “ + tgtFile, “”, 0, True)
Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireInformation(1, “”, “File move failure. Tried to move to: “ + tgtFile, “”, 0, False)
Dts.TaskResult = Dts.Results.Failure
End
Try

End If

End Sub

After the Script Task is defined, the package can now move and rename a source data file to the designated Archive subdirectory. This is great if you like to code. My personal opinion, it’s too much logic for a simple ‘move’ command.

DataStage File Archive

In DataStage, the process of archiving files can be controlled through the job’s properties. (Note: a DataStage job is the equivalent to an SSIS package). One only needs to go to the job’s properties and set the ‘After-job Subroutine’.

croppercapture1.Png

In the above screenshot, you’ll see that the ‘After-job Subroutine’ is set to ‘ExecDos’, which executes an easy DOS move command. It doesn’t get any easier than DOS. The InputValue is set to the actual command syntax. For clarification, the command is using variables, which are as follows:

#$DrvPath# – drive letter, eg. c:\

#$SrcPath# – source data directory, eg. data\sources\

The command is also using %date to retrieve parts of the date in yyyymmdd format.

Conclusion

If you love to code and you revel in the idea of working with VBA, then ofcourse SSIS is your preference. But what this post intends to point out is that one shouldn’t have to write code for what is essentially one line at the DOS command line.

November 1, 2006

SSIS:: Test for Data Files Existence

Filed under: ETL, SQL Server Integration Services, SSIS — Zack Bethem @ 6:15 pm

A common data source for my packages includes flat files. Unlike the scheduling of the master package, however, the flat files may not be staged every single day. Leaving the packages responsible for loading the files as is, however, will result in a false failure. This leads to the following script task in the control flow. The task will read input variables, which the file connection manager should also be using, to test whether the file exists or not. If it does not, it will not execute the subsequent data flow and return a success.

Defined variables:
ExecDir
FileExists
FileName

croppercapture15.Png

Defined Data Flow:

croppercapture13.Png

Script Task

In the Control Flow, add a Script Task. The task is defined with ReadOnlyVariables of:
User::ExecDir
User::FileName

The ReadWriteVariables is set to:
User::ExecDir

Add the following code to the script task:

Imports System
Imports System.Data
Imports System.Math
Imports System.IO
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain
‘ Created: Zack Bethem – AmberLeaf
Public Sub Main()
Dim fileLoc, fileName As String

If Dts.Variables.Contains(“User::ExecDir”) = True AndAlso _
Dts.Variables.Contains(“User::FileName”) = True Then
fileLoc = CStr(Dts.Variables(“User::ExecDir”).Value)
fileName = CStr(Dts.Variables.Item(“User::FileName”).Value)
‘System.Windows.Forms.MessageBox.Show(“FileDir:”+fileLoc+”FileName:” + fileName)

If File.Exists(fileLoc + fileName) Then
Dts.Variables.Item(“User::FileExists”).Value = True
‘System.Windows.Forms.MessageBox.Show(“File exists”)
Else
Dts.Variables.Item(“User::FileExists”).Value = False
‘System.Windows.Forms.MessageBox.Show(“File not exists”)
End If

Dts.TaskResult = Dts.Results.Success
Else
Dts.TaskResult = Dts.Results.Failure
End If

End Sub
End Class

Now your set. The script task will read in the variables User::ExecDir and User::FileName. Using those variables, it will use the System.IO to determine if the file exists or not. If it does, the variable User::FileExist variable is set to True. Otherwise, the variable is set to False.

Conditions in Control Flow

The next step is to define a condition in the control flow. This is done by double clicking on the path line from the Script task to the Data Flow task. This leads to the following dialog:

Here is where you add the conditional logic. The Evaluation Operation is set to ‘Expression’. Using our User::FileExists variable, a check is done to see if the value is equal to True. If it is, the data flow executes. Otherwise, we have our graceful exit and the package is successful.


Create a free website or blog at WordPress.com.