Miscellaneous Ramblings on Data Warehousing

August 18, 2008

Reporting in Salesforce

Filed under: salesforce.com — Zack @ 10:19 pm

Oh how I loathe thee… so damn limited, I’m tempted just to build a secondary database to run a freakin’ SQL query. Real SQL not the SOQL stuff that can’t even do joins.

Give me the ability to aggregate on my terms

Give me the ability to specify joins on my terms

Give me the ability to specify OUTER joins on my terms

 

Oh how I loathe thee…. so damn limiting.

August 13, 2008

Salesforce.com APEX - new record trigger

Filed under: apex, salesforce.com — Zack @ 4:53 pm

Bouncing around a little bit now, I’ve found myself on a Salesforce.com project (dare I say “trapped”?). Anyway, I did have some fun coding with the Force.com IDE.

In this scenario, I had to write my very first trigger. The requirement was to create a case upon the creation of a new record. My code ended up looking like this for the trigger.

trigger createCaseFromBroker on BrokerShop__c (after insert) {

    List<Group> sfdcGroup = [select Id, name from Group where Name = 'Support Queue' limit 1];

        if(Trigger.new.size() == 1){
        for(BrokerShop__c b:Trigger.new){
          //for each broker, b, create a new DueDiligence case
              Case newCase = new Case(
                Subject = 'Perform Due Diligence',
                Broker_Shop__c = b.Id,
                Status = 'Open',
                Priority = 'Medium',
                Category__c = 'Pre-Approval',
                Subcategory__c = 'How to become a Broker',
                Origin = 'Web'  //is there an origin? required in UI
              );

              //if broker has a primary contact
              if(b.Primary_Contact__c != null){
                  newCase.ContactId = b.Primary_Contact__c;
              }
              if(!sfdcGroup.isEmpty()){
                  newCase.OwnerId = sfdcGroup[0].Id;
              }
              insert newCase;
        }
        }
}

A couple of notes on this:

  • This is the main trigger set to fire ‘after insert’
  • The trigger is based on a custom SFDC object

On to the unit testing. I will give credit to Salesforce on this one. Requiring unit tests with a minimum coverage of 75% is commendable. Annoying for those of us that just hack our way (me), but still commendable. On to the tests. I encapsulated all of the test cases into my own class file.

public class CreateCase {

  static testMethod void test_createCaseFromBroker(){

      //create contact record
      Contact contact = new Contact();
      contact.FirstName = 'Test 1 - contact';
      contact.LastName = 'Dummy';
      insert contact;
      String contactId = contact.Id;

    //create broker record #1
    BrokerShop__c broker1 = new BrokerShop__c();
    broker1.Name = 'Test 1 - broker';
    broker1.Street_Address__c = '123 Test St';
    broker1.City__c = 'Test';
    broker1.State__c = 'California';
    broker1.Zip__c = '00000';
    broker1.Primary_Contact__c = contactId;

    insert broker1;
    String broker1Id = broker1.Id;

    //validate case was created
    List<Case> bsCase = [
        SELECT Id,Broker_Shop__c
            FROM case
            WHERE Broker_Shop__c = :broker1Id
            LIMIT 1
        ];
    for( Case c:bsCase ){
        System.assertEquals(broker1Id,c.Broker_Shop__c);
    }

  }

}

So there you have it. A insert trigger within Salesforce to create a related case record.

July 24, 2007

Using WinZip within SSIS

Filed under: ETL, SSIS — Zack @ 11:30 pm

Windows comes with the ability to compress files. In practice, however, I found that the standard utility starts to choke when compressing larger flat files (greater than 2GB). To get around this problem within SSIS, I used the default WinZip program. There were some tricks to get it working, however.

Using a Process Task in the Control Flow of a package, I configured the following via Expressions in the Expression Editor:

  • set the WorkingDirectory to the expected location of the data file
  • set the Arguments to evaluate as: -min -a <fullpath>\<zipfile>.zip <fullpath>\<sourcefile>.txt

The Process Task Executable argument was set to the absolute path of the WINZIP32.EXE program (eg. C:\Program Files\WinZip\WINZIP32.EXE)

Automating FTP

Filed under: ETL, SQL Server Integration Services, SSIS — Zack @ 11:19 pm

Processing flat files are a common task when building a Data Warehouse. The need typically arises when integrating with 3rd-party providers or data outsourcers. A common integration technique is to use flat files with FTP as a transfer mechanism. One should use secure FTP, but SSIS out-of-the-box only supports the weaker, less-secure FTP.

When configuring the FTP connection, SSIS cannot store the login information. This is great if your final solution requires a human to push a button and enter the security information every time, but hardly ideal for us that like to automate things and sleep at night. To get around the problem, one must include a Script Task in the package’s Control Flow. The Script Task will set the connection properties of the FTP Connection automatically.

Here’s how you do it:

1. Add a Script Task within the Control Flow before the FTP

Sample Control Flow

2. Assign the following user defined variables as ReadOnlyVariables:

FtpServer - server connection

FtpPort - port number used for FTP connection

FTPuser - FTP user account

FTPpwd - FTP password for account

3. Add the following code to the Script Task to set the connection properties for the FTP Connection Manager:

Imports System

Imports System.Data

Imports System.Math

Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

Public Sub Main()

Dim ftpConnectionManager As ConnectionManager

ftpConnectionManager = Dts.Connections(“FTP Connection Manager”)

ftpConnectionManager.Properties(“ServerName”).SetValue(ftpConnectionManager, Dts.Variables(“FtpServer”).Value)

ftpConnectionManager.Properties(“ServerPort”).SetValue(ftpConnectionManager, Dts.Variables(“FtpPort”).Value)

ftpConnectionManager.Properties(“ServerUserName”).SetValue(ftpConnectionManager, Dts.Variables(“FTPuser”).Value)

ftpConnectionManager.Properties(“ServerPassword”).SetValue(ftpConnectionManager, Dts.Variables(“FTPpwd”).Value)

Dts.TaskResult = Dts.Results.Success

End Sub

End Class

July 14, 2007

Vista Upgrade - RDP to Windows 2003

Filed under: Uncategorized — Zack @ 5:35 pm

I was having difficulty after the Vista Business upgrade connecting to clients via RDP. I’m doing a lot of work these days from home. The setup is typical, using the RDP ActiveX control to connect to a remote machine.  This didn’t work, however, in Vista.  Instead, the page would slowly draw and eventually crash.

The solution was found! Thanks to this post.  I have no idea what these two little commands do, but it worked like a charm.

netsh interface tcp set global rss=disabled
netsh interface tcp set global autotuninglevel=disabled

November 29, 2006

Variables in Source SQL

Filed under: ETL, SQL Server Integration Services, SSIS — Zack @ 11:09 pm

When dealing with source database systems, it’s common to only want to process the “delta” of changed or new records. These systems typically have a created and last_updated datestamp on each record for audit purposes. This entry will detail how one can do this in SSIS.

Last Run Date

In my scenario, I opted to store a transaction log table. This table stores the package execution history. This table is then queried in a SQL Task in the control flow. The SQL Task is set to retrieve only one row and store the last_load_date into a variable. This variable will then be used later in the where clause of my source SQL in later data flows.

The creation script for the table is below.

CREATE TABLE dbo.load_history (
mapping_name nvarchar (50) NULL ,
last_load_date smalldatetime NULL ,
start_date smalldatetime NULL ,
end_date datetime NULL ,
status nvarchar (20) NULL
)

The setup of the SQL Task to return the last_load_date into a variable is done as follows:

  1. Defined a variable LastRunDate with a data type of DateTime.
  2. Under the General tab, set the Result Set to ‘Single Row’. The SQLStatement is set to:
    select last_load_date from load_history
    where mapping_name = <mapping_name>
    and end_date is null
    and status = ‘Pending’
  3. croppercapture26.Png

  4. Under the Result Set tab, set the Result Name: ‘last_load_date’ to the defined variable. In my case, I named my variable: LastRunDate.croppercapture27.Png
The Hidden Magic of Variable Properties

Next, is to define a variable for the source sql.

  1. Add a variable of data type string. In my case, I named the variable: SQL
  2. With the variable selected in the variable window, hit F4 to open the variable’s properties
  3. Set the property ‘EvaluateAsExpression’ to true
  4. Set the property expression using the variable defined earlier. Don’t forget that the expression is a string, so it must be wrapped in double quotes. For example:“select col_1, col_2, … from table_1
    where
    last_update_date >= ‘” + @[User::LastRunDate] + “‘
    or created_date >= ‘” + @[User::LastRunDate] + “‘;”

    croppercapture28.Png

November 14, 2006

Source File Archiving: SSIS versus DataStage

Filed under: DataStage, ETL, SQL Server Integration Services, SSIS — Zack @ 4:42 pm

I thought it would be interesting to compare and contrast the differences between SSIS and IBM DataStage. It is easy for some to be married to one particular technology, so I thought that this would be interesting for some.

In this example, we’re in the situation of working with source files. After a load, I believe it’s best practice to archive the files that you’ve loaded for later reference. These files can then be absorbed by the corporate archive strategy all ready in place. The goal of the process is as follows:

  1. Open File
  2. Process data in file and load to target
  3. Archive File

SSIS File Archive

In SSIS, steps 1 and 2 are handled by the Data Flow of the package. Step 3 is executed within the Control Flow. There are two methods available for archiving:

  1. Utilize two File System tasks in sequential order. The first to move the file to the desired directory. The second, rename the file with the appropriate date stamp. The downside, each File System task requires that a destination and source connection manager be defined.
  2. Utilize a Script Task that executes code to move and rename a file. This has lots more control and can do the move and rename in one step. The downside, code reuse will require one to either create their own transformer or copy the script into to each package.

Using this script as a reference, I went the scripting route as it seemed cleaner. The script was defined with the following variables:

User::FileName - Data file name, eg. loandata.txt
User::FileDir_DATA - Data directory on the staging server, eg. ‘c:\data\sources\’
User::FileDir_TYPE - Data sub-directory, eg. ‘loandata\’

Here’s the code:

Imports System Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.IO

Public Class ScriptMain

‘Reference: http://blogs.conchango.com/jamiethomson/archive/2005/09/14/2149.aspx

Public Sub Main()

‘check if vars exist
If Dts.Variables.Contains(“User::FileName”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_DATA”) = True AndAlso _
Dts.Variables.Contains(“User::FileDir_TYPE”) = True
Then

Dim srcFile, tgtFile, tgtFileName, searchStr As String

searchStr = “.”
srcFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = CStr(Dts.Variables(“User::FileName”).Value)

tgtFileName = Left$(tgtFileName, InStrRev(tgtFileName, “.”) - 1) + _
“_” + CStr(Format(Now, “yyyyMMddHH”)) + _
Mid(tgtFileName, InStrRev(tgtFileName, “.”))

‘System.Windows.Forms.MessageBox.Show(tgtFileName)

tgtFile = CStr(Dts.Variables(“User::FileDir_DATA”).Value) + _
CStr(Dts.Variables(“User::FileDir_TYPE”).Value) + _
“Archive\” + tgtFileName
‘System.Windows.Forms.MessageBox.Show(tgtFile)

Try
File.Move(srcFile, tgtFile)
Dts.Events.FireInformation(0, “”, “File moved to: “ + tgtFile, “”, 0, True)
Dts.TaskResult = Dts.Results.Success
Catch ex As Exception
Dts.Events.FireInformation(1, “”, “File move failure. Tried to move to: “ + tgtFile, “”, 0, False)
Dts.TaskResult = Dts.Results.Failure
End
Try

End If

End Sub

After the Script Task is defined, the package can now move and rename a source data file to the designated Archive subdirectory. This is great if you like to code. My personal opinion, it’s too much logic for a simple ‘move’ command.

DataStage File Archive

In DataStage, the process of archiving files can be controlled through the job’s properties. (Note: a DataStage job is the equivalent to an SSIS package). One only needs to go to the job’s properties and set the ‘After-job Subroutine’.

croppercapture1.Png

In the above screenshot, you’ll see that the ‘After-job Subroutine’ is set to ‘ExecDos’, which executes an easy DOS move command. It doesn’t get any easier than DOS. The InputValue is set to the actual command syntax. For clarification, the command is using variables, which are as follows:

#$DrvPath# - drive letter, eg. c:\

#$SrcPath# - source data directory, eg. data\sources\

The command is also using %date to retrieve parts of the date in yyyymmdd format.

Conclusion

If you love to code and you revel in the idea of working with VBA, then ofcourse SSIS is your preference. But what this post intends to point out is that one shouldn’t have to write code for what is essentially one line at the DOS command line.

November 1, 2006

SSIS:: Test for Data Files Existence

Filed under: ETL, SQL Server Integration Services, SSIS — Zack @ 6:15 pm

A common data source for my packages includes flat files. Unlike the scheduling of the master package, however, the flat files may not be staged every single day. Leaving the packages responsible for loading the files as is, however, will result in a false failure. This leads to the following script task in the control flow. The task will read input variables, which the file connection manager should also be using, to test whether the file exists or not. If it does not, it will not execute the subsequent data flow and return a success.

Defined variables:
ExecDir
FileExists
FileName

croppercapture15.Png

Defined Data Flow:

croppercapture13.Png

Script Task

In the Control Flow, add a Script Task. The task is defined with ReadOnlyVariables of:
User::ExecDir
User::FileName

The ReadWriteVariables is set to:
User::ExecDir

Add the following code to the script task:

Imports System
Imports System.Data
Imports System.Math
Imports System.IO
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain
‘ Created: Zack Bethem - AmberLeaf
Public Sub Main()
Dim fileLoc, fileName As String

If Dts.Variables.Contains(”User::ExecDir”) = True AndAlso _
Dts.Variables.Contains(”User::FileName”) = True Then
fileLoc = CStr(Dts.Variables(”User::ExecDir”).Value)
fileName = CStr(Dts.Variables.Item(”User::FileName”).Value)
‘System.Windows.Forms.MessageBox.Show(”FileDir:”+fileLoc+”FileName:” + fileName)

If File.Exists(fileLoc + fileName) Then
Dts.Variables.Item(”User::FileExists”).Value = True
‘System.Windows.Forms.MessageBox.Show(”File exists”)
Else
Dts.Variables.Item(”User::FileExists”).Value = False
‘System.Windows.Forms.MessageBox.Show(”File not exists”)
End If

Dts.TaskResult = Dts.Results.Success
Else
Dts.TaskResult = Dts.Results.Failure
End If

End Sub
End Class

Now your set. The script task will read in the variables User::ExecDir and User::FileName. Using those variables, it will use the System.IO to determine if the file exists or not. If it does, the variable User::FileExist variable is set to True. Otherwise, the variable is set to False.

Conditions in Control Flow

The next step is to define a condition in the control flow. This is done by double clicking on the path line from the Script task to the Data Flow task. This leads to the following dialog:

Here is where you add the conditional logic. The Evaluation Operation is set to ‘Expression’. Using our User::FileExists variable, a check is done to see if the value is equal to True. If it is, the data flow executes. Otherwise, we have our graceful exit and the package is successful.


October 30, 2006

Creating the Master Package

Filed under: ETL, SQL Server Integration Services, SSIS — Zack @ 9:37 pm

It’s come to that point where all of the SSIS packages for the data warehouse are complete. As other searches have pointed out, there are multiple ways to glue everything together. Coming from a DataStage or Informatica background, however, I prefer the creation of a “master package”. This package is strictly a control flow that utilizes several ‘Execute Package’ tasks.

Step 1: Define File Connection

For this implementation, all of our packages will exist as files. This requires that each package have a defined file connection within the master package. Under the Connection Manger, right click to add a new file connection for the following dialog prompt.

pngcroppercapture3.PNG

Click on Browse to add the package of choice in the following dialog.

pngcroppercapture4.PNG

After selecting the file, click on the File Name listbox. This will automatically highlight the filename. Copy the filename to the clipboard by pressing Ctrl-C. This will save time in the later steps.

Once the file connection is added, an expression will need to be defined for the ConnectionString property. In my installation, I defined a variable User::ExecDir_DWL that corresponds to the project folder location for all of my packages. This allows me to define the ConnectionString as follows (hint: use Ctrl-V to paste in the package name to save time and reduce errors). Once complete, this will ensure that the file connection is not tied to any specific location. In turn, this allows for the master package to be easily deployed to any folder structure.
pngcroppercapture5.PNG

Step 2: Add Execute Package Task

The final piece of the pie is to add the Execute Package task to the control flow. Here, I use the file name text copied to the clipboard earlier to save time.

  • First, rename the execute package task to ‘EXEC <package_name>‘.
  • Second, explicitly set the property PackageName using the expression builder. I’ve noticed that the package name isn’t set when using a package template. This should override that issue.

October 16, 2006

SSIS Script: RowCounter

Filed under: ETL, SQL Server Integration Services, SSIS — Zack @ 10:06 pm

Straight from the MSDN archives, I found this little gem to count rows within a data pipe as they are processed. The key phrase is: “as they are processed”. You’d think the Row Count transformation would work if you wanted a sequential row number. Unfortunately, this transformation only updates the assigned variable at the end of processing for the given Data Flow task.

As a lover of code snippets for reuse, here’s what you have to do:

1. Add an Output Column.  I called mine rowCount.

2. Add  the following script lines:

Public Class ScriptMain
  Inherits UserComponent
  Dim iCount As Integer = 0
  Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
    Row.rowCount = iCount
    iCount = iCount + 1
  End Sub
End Class

Older Posts »

Blog at WordPress.com.