Handling different row types in the same file

Sometimes source systems like to send us text files that are a little out of the ordinary. More and more frequently we are seeing people being sent master and detail or parent and child rows in the same file. Handling this in DTS is painful. In SSIS that changes. In this article we'll take you through one file of this type and provide a solution which is more elegant than what we have today, and hopefully will give you ideas as to how you can handle your own versions of this type of file. As mentioned this is a simple solution but we could extend it to include more powerful processing techniques like performing lookups to see if we already have in our destination any of the rows coming through the pipeline etc. This is how our package should look when everything is finished.

PackageComplete

The graphic below shows us the text file sent to us by our source system. As you can see the last column is the type of row so a "M" indicates a master row and a "D" indicates a detail row. In this example there is one master row and three detail rows.

TheTextFile

We create a flat file connection manager that points to this file. We are not going to split the lines in the text file up into their respective columns yet because we have two types of rows and they have varying counts of columns. Instead we are going to read in the line as one string. This is how we set up the connection manager to do that.

ConnectionManager

Once we've done that we add a DataFlow task to our package and in the dataflow behind the task we add a Flat File Source Adapter which points to our earlier connection manager. The only thing we're going to change is the name of the ouput column, as shown here.

OutputColumn

After doing that we are going to flow the output into a Conditional Split Transform. We will use this transform to interpret the line type through the use of the final column in the text file and direct the row down the correct output. With the Conditional Split transform in our example we will have three possible outputs flowing downstream. The first two are based on whether we have decided the row is a Master row or a Detail row and the third is one that we get for free and this is where all other rows that do not meet our criteria will flow.

CondSplit

So we've split up our rows and assigned them to the correct outputs we now need to hook them up to a Script Transform to break the line apart and create our real columns from the one line. When you drag an output from the Conditional Split Transform to the Script Transform you will be asked which output you would like to attach. In the Script Transform the first thing we are going to do is add some columns to the Output. When the data has flowed through this component, downstream components will see these columns and it will be easier for the package designer to assign them to their correct destinations. Here's how we do that.

MasterOutput

We are only going to look at the script transform that handles the rows flowing down the master row output, but the code used can easily be adapted to also handle the rows flowing through the detail row output. We now have one column coming into this transform and we have a number of columns we have just created ready to take data from this column so let's write some script to do that.

MasterScript

All this code is doing is taking the column that comes in (TheWholeLineColumn) and splitting it up based on a comma delimiter. We then assign elements in the created array out to our ouput columns. All that is left now is to hook up our output from the script transform to a destination and for ease of use we have simply pointed the output to a Raw File Destination Adapter. We have added a dataviewer to the flow of data going into the Script transform so we can see the definition of the row going in. You can see this in the graphic below.

DataViewer

Finally, all that is left for us to do is run the package and watch the rows flow down the pipeline. What we should see is that one row flows to the master output and three rows flow down to the details output

Success