Miscellaneous Ramblings on Hacking

November 14, 2006

Source File Archiving: SSIS versus DataStage

Filed under: DataStage, ETL, SQL Server Integration Services, SSIS — Zack Bethem @ 4:42 pm

I thought it would be interesting to compare and contrast the differences between SSIS and IBM DataStage. It is easy for some to be married to one particular technology, so I thought that this would be interesting for some.

In this example, we’re in the situation of working with source files. After a load, I believe it’s best practice to archive the files that you’ve loaded for later reference. These files can then be absorbed by the corporate archive strategy all ready in place. The goal of the process is as follows:

  1. Open File
  2. Process data in file and load to target
  3. Archive File

SSIS File Archive

In SSIS, steps 1 and 2 are handled by the Data Flow of the package. Step 3 is executed within the Control Flow. There are two methods available for archiving:

  1. Utilize two File System tasks in sequential order. The first to move the file to the desired directory. The second, rename the file with the appropriate date stamp. The downside, each File System task requires that a destination and source connection manager be defined.
  2. Utilize a Script Task that executes code to move and rename a file. This has lots more control and can do the move and rename in one step. The downside, code reuse will require one to either create their own transformer or copy the script into to each package.

Using this script as a reference, I went the scripting route as it seemed cleaner. The script was defined with the following variables:

User::FileName – Data file name, eg. loandata.txt
User::FileDir_DATA – Data directory on the staging server, eg. ‘c:\data\sources\’
User::FileDir_TYPE – Data sub-directory, eg. ‘loandata\’

Here’s the code:

Imports System Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO

Public Class ScriptMain

‘Reference: http://blogs.conchango.com/jamiethomson/archive/2005/09/14/2149.aspx

Public Sub Main()

‘check if vars exist
If Dts.Variables.Contains(“User::FileName”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_DATA”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_TYPE”) = True
Then

Dim srcFile, tgtFile, tgtFileName, searchStr As String

searchStr = “.”
srcFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = Left$(tgtFileName, InStrRev(tgtFileName, “.”) – 1) + _
“_” + CStr(Format(Now, “yyyyMMddHH”)) + _
Mid(tgtFileName, InStrRev(tgtFileName, “.”))

‘System.Windows.Forms.MessageBox.Show(tgtFileName)

tgtFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
“Archive\” + tgtFileName
‘System.Windows.Forms.MessageBox.Show(tgtFile)

Try
File.Move(srcFile, tgtFile)
Dts.Events.FireInformation(0, “”, “File moved to: “ + tgtFile, “”, 0, True)
Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireInformation(1, “”, “File move failure. Tried to move to: “ + tgtFile, “”, 0, False)
Dts.TaskResult = Dts.Results.Failure
End
Try

End If

End Sub

After the Script Task is defined, the package can now move and rename a source data file to the designated Archive subdirectory. This is great if you like to code. My personal opinion, it’s too much logic for a simple ‘move’ command.

DataStage File Archive

In DataStage, the process of archiving files can be controlled through the job’s properties. (Note: a DataStage job is the equivalent to an SSIS package). One only needs to go to the job’s properties and set the ‘After-job Subroutine’.

croppercapture1.Png

In the above screenshot, you’ll see that the ‘After-job Subroutine’ is set to ‘ExecDos’, which executes an easy DOS move command. It doesn’t get any easier than DOS. The InputValue is set to the actual command syntax. For clarification, the command is using variables, which are as follows:

#$DrvPath# – drive letter, eg. c:\

#$SrcPath# – source data directory, eg. data\sources\

The command is also using %date to retrieve parts of the date in yyyymmdd format.

Conclusion

If you love to code and you revel in the idea of working with VBA, then ofcourse SSIS is your preference. But what this post intends to point out is that one shouldn’t have to write code for what is essentially one line at the DOS command line.

Advertisements

1 Comment »

  1. Ok, I found another way to do this which I like much better. Here is what you need to do:

    1. Create a Package Level Variable (say ArchiveFileName) as type string.
    2. Go to the Properties of the Variable (I found this very hard to get to, so here is how I did it. You have to the properties window already up, then left click on the X icon next the variable name.)
    3. Inside the properties window Change EvaluateAsExpression to True.
    4. Enter the file path you want to archive to in the expression field.
    Example:
    “c:\\TestArchive” + (DT_STR, 4, 1252) YEAR( GETDATE() ) +
    RIGHT(“0” + (DT_STR, 2, 1252) MONTH( GETDATE() ), 2) +
    RIGHT(“0” + (DT_STR, 2, 1252) DAY( GETDATE() ), 2) + “.txt”

    5. Create a File System Task
    6. Set “IsDestinationPathVarible” to True inside the File System task
    7. Select the ArchiveFileName variable
    8. Select your Source File Name
    9. Run the task

    Comment by jfoudy — January 5, 2009 @ 7:18 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: