Handling different row types in the same file

by Allan Mitchell 26 May 2005 14:00

Sometimes source systems like to send us text files that are a little out of the ordinary. More and more frequently we are seeing people being sent master and detail or parent and child rows in the same file. Handling this in DTS is painful. In SSIS that changes. In this article we'll take you through one file of this type and provide a solution which is more elegant than what we have today, and hopefully will give you ideas as to how you can handle your own versions of this type of file. As mentioned this is a simple solution but we could extend it to include more powerful processing techniques like performing lookups to see if we already have in our destination any of the rows coming through the pipeline etc. This is how our package should look when everything is finished.

PackageComplete

The graphic below shows us the text file sent to us by our source system. As you can see the last column is the type of row so a "M" indicates a master row and a "D" indicates a detail row. In this example there is one master row and three detail rows.

TheTextFile

We create a flat file connection manager that points to this file. We are not going to split the lines in the text file up into their respective columns yet because we have two types of rows and they have varying counts of columns. Instead we are going to read in the line as one string. This is how we set up the connection manager to do that.

ConnectionManager

Once we've done that we add a DataFlow task to our package and in the dataflow behind the task we add a Flat File Source Adapter which points to our earlier connection manager. The only thing we're going to change is the name of the ouput column, as shown here.

OutputColumn

After doing that we are going to flow the output into a Conditional Split Transform. We will use this transform to interpret the line type through the use of the final column in the text file and direct the row down the correct output. With the Conditional Split transform in our example we will have three possible outputs flowing downstream. The first two are based on whether we have decided the row is a Master row or a Detail row and the third is one that we get for free and this is where all other rows that do not meet our criteria will flow.

CondSplit

So we've split up our rows and assigned them to the correct outputs we now need to hook them up to a Script Transform to break the line apart and create our real columns from the one line. When you drag an output from the Conditional Split Transform to the Script Transform you will be asked which output you would like to attach. In the Script Transform the first thing we are going to do is add some columns to the Output. When the data has flowed through this component, downstream components will see these columns and it will be easier for the package designer to assign them to their correct destinations. Here's how we do that.

MasterOutput

We are only going to look at the script transform that handles the rows flowing down the master row output, but the code used can easily be adapted to also handle the rows flowing through the detail row output. We now have one column coming into this transform and we have a number of columns we have just created ready to take data from this column so let's write some script to do that.

MasterScript

All this code is doing is taking the column that comes in (TheWholeLineColumn) and splitting it up based on a comma delimiter. We then assign elements in the created array out to our ouput columns. All that is left now is to hook up our output from the script transform to a destination and for ease of use we have simply pointed the output to a Raw File Destination Adapter. We have added a dataviewer to the flow of data going into the Script transform so we can see the definition of the row going in. You can see this in the graphic below.

DataViewer

Finally, all that is left for us to do is run the package and watch the rows flow down the pipeline. What we should see is that one row flows to the master output and three rows flow down to the details output

Success

Comments (6) -

3/3/2009 2:14:14 PM #

James

This is great! I'm working on something very similar, although the destination is a multi row (header/detail) flat file. I'm struggling.

James United States

5/14/2009 4:08:08 PM #

Andy

Would there be a way to force the flow to process those master rows first followed by the child rows?

The only way I can think of doing this would be to have some kind of additional staging tables.

Andy United Kingdom

5/16/2009 11:00:32 AM #

Darren Green

Andy, there is no way to do that inside a single Data Flow so staging it is really the best option, but you may consider raw files over tables. They are very efficient, although the real world results will depend on your architecture and overall process.

Darren Green United Kingdom

8/3/2009 8:26:20 PM #

Tania

that example is great i need to do all in revearse .create a file with invoice details 18 fiels and at the end of each invoice print total of only 4 fields,and all this should be in one text file.... thank you!if someone can give me advise.

Tania

8/20/2009 1:31:31 PM #

Hamed

Awesome Post, We were trying to write code in .NET for this scenario, but your post saved the time and unnecessary coding

Hamed

8/26/2009 8:52:03 PM #

Jennifer

This is a great post but with a source data like this how are you going to relate detail rows back to the header.  It seem like you'll end up with two useless lists of headers and lines.  I guess that's why it's a simple solution for us to build on for our own purposes.  :)

Jennifer United States

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

RecentComments

Comment RSS