Miscellaneous Ramblings on Hacking

November 14, 2006

Source File Archiving: SSIS versus DataStage

Filed under: DataStage, ETL, SQL Server Integration Services, SSIS — Zack @ 4:42 pm

I thought it would be interesting to compare and contrast the differences between SSIS and IBM DataStage. It is easy for some to be married to one particular technology, so I thought that this would be interesting for some.

In this example, we’re in the situation of working with source files. After a load, I believe it’s best practice to archive the files that you’ve loaded for later reference. These files can then be absorbed by the corporate archive strategy all ready in place. The goal of the process is as follows:

  1. Open File
  2. Process data in file and load to target
  3. Archive File

SSIS File Archive

In SSIS, steps 1 and 2 are handled by the Data Flow of the package. Step 3 is executed within the Control Flow. There are two methods available for archiving:

  1. Utilize two File System tasks in sequential order. The first to move the file to the desired directory. The second, rename the file with the appropriate date stamp. The downside, each File System task requires that a destination and source connection manager be defined.
  2. Utilize a Script Task that executes code to move and rename a file. This has lots more control and can do the move and rename in one step. The downside, code reuse will require one to either create their own transformer or copy the script into to each package.

Using this script as a reference, I went the scripting route as it seemed cleaner. The script was defined with the following variables:

User::FileName – Data file name, eg. loandata.txt
User::FileDir_DATA – Data directory on the staging server, eg. ‘c:\data\sources\’
User::FileDir_TYPE – Data sub-directory, eg. ‘loandata\’

Here’s the code:

Imports System Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO

Public Class ScriptMain

‘Reference: http://blogs.conchango.com/jamiethomson/archive/2005/09/14/2149.aspx

Public Sub Main()

‘check if vars exist
If Dts.Variables.Contains(“User::FileName”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_DATA”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_TYPE”) = True
Then

Dim srcFile, tgtFile, tgtFileName, searchStr As String

searchStr = “.”
srcFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = Left$(tgtFileName, InStrRev(tgtFileName, “.”) – 1) + _
“_” + CStr(Format(Now, “yyyyMMddHH”)) + _
Mid(tgtFileName, InStrRev(tgtFileName, “.”))

‘System.Windows.Forms.MessageBox.Show(tgtFileName)

tgtFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
“Archive\” + tgtFileName
‘System.Windows.Forms.MessageBox.Show(tgtFile)

Try
File.Move(srcFile, tgtFile)
Dts.Events.FireInformation(0, “”, “File moved to: “ + tgtFile, “”, 0, True)
Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireInformation(1, “”, “File move failure. Tried to move to: “ + tgtFile, “”, 0, False)
Dts.TaskResult = Dts.Results.Failure
End
Try

End If

End Sub

After the Script Task is defined, the package can now move and rename a source data file to the designated Archive subdirectory. This is great if you like to code. My personal opinion, it’s too much logic for a simple ‘move’ command.

DataStage File Archive

In DataStage, the process of archiving files can be controlled through the job’s properties. (Note: a DataStage job is the equivalent to an SSIS package). One only needs to go to the job’s properties and set the ‘After-job Subroutine’.

croppercapture1.Png

In the above screenshot, you’ll see that the ‘After-job Subroutine’ is set to ‘ExecDos’, which executes an easy DOS move command. It doesn’t get any easier than DOS. The InputValue is set to the actual command syntax. For clarification, the command is using variables, which are as follows:

#$DrvPath# – drive letter, eg. c:\

#$SrcPath# – source data directory, eg. data\sources\

The command is also using %date to retrieve parts of the date in yyyymmdd format.

Conclusion

If you love to code and you revel in the idea of working with VBA, then ofcourse SSIS is your preference. But what this post intends to point out is that one shouldn’t have to write code for what is essentially one line at the DOS command line.

Blog at WordPress.com.