Regular Expression Transformation

by Darren Green 5 Jun 2012 14:53

The regular expression transformation exposes the power of regular expression matching within the pipeline. One or more columns can be selected, and for each column an individual expression can be applied. The way multiple columns are handled can be set on the options page. The AND option means all columns must match, whilst the OR option means only one column has to match. If rows pass their tests then rows are passed down the successful match output. Rows that fail are directed down the alternate output.

This transformation is ideal for validating data through the use of regular expressions. You can enter any expression you like, or select a pre-configured expression within the editor. You can expand the list of pre-configured expressions yourself. These are stored in a Xml file, %ProgramFiles%\Microsoft SQL Server\nnn\DTS\PipelineComponents\RegExTransform.xml, where nnn represents the folder version, 90 for 2005, 100 for 2008 and 110 for 2012.

If you want to use regular expressions to manipulate data, rather than just validating it, try the RegexClean Transformation.

The component is provided as an MSI file, however for 2005/200 you will have to add the transformation to the Visual Studio toolbox by hand. This process has been described in detail in the related FAQ entry for How do I install a task or transform component?, just select Regular Expression Transformation in the Choose Toolbox Items window.

Downloads

The Regular Expression Transformation is available for SQL Server 2005, SQL Server 2008 (includes R2) and SQL Server 2012. Please choose the version to match your SQL Server version, or you can install multiple versions and use them side by side if you have more than one version of SQL Server installed.

Regular Expression Transformation for SQL Server 2005

Regular Expression Transformation for SQL Server 2008

Regular Expression Transformation for SQL Server 2012

Version History

SQL Server 2012

Version 2.0.0.87 - SQL Server 2012 release. Includes upgrade support for both 2005 and 2008 packages to 2012.
(5 Jun 2012)

SQL Server 2008

Version 2.0.0.87 - Release for SQL Server 2008 Integration Services.
(10 Oct 2008)

SQL Server 2005

Version 1.1.0.93 - Added option for you to choose AND or OR logic when multiple columns have been selected. Previously behaviour was OR only.
(31 Jul 2008)

Version 1.0.0.76 - Installer update and improved exception handling.
(28 Jan 2008)

Version 1.0.0.41 - Update for user interface stability fixes.
(2 Aug 2006)

Version 1.0.0.24 - SQL Server 2005 RTM Refresh. SP1 Compatibility Testing.
(12 Jun 2006)

Version 1.0.0.9 - Public Release for SQL Server 2005 IDW 15 June CTP
(29 Aug 2005)

Screenshots

Regular Expression Editor dialog 1 Regular Expression Editor dialog 2

 

Comments (11) -

2/20/2009 10:03:47 PM #

Garth H

Great component, why this doesn't ship out of the box is beyond me, but I'm glad I found it. Pickup a regular expression cookbook, populate your xml file and this component saves SOoo much time it isn't even funny.

Thanks!

Garth H United States

3/4/2010 4:08:53 PM #

Rishi

How can i use the same instance of the component to give me 3 output.

1)MatchedRows
2)Non-Matched Rows
3) ErrorOutput.

Rishi United States

3/19/2010 10:05:20 AM #

Sandy

Great component. Will definitely use it.

Sandy United States

4/2/2010 8:15:08 AM #

Vcloud

Rishi - Non matched rows are directed to the alternate output. Do you want those to display in the same output along with the matched rows.

Excellent component. It really saves lot of time.

Vcloud United States

11/3/2010 4:42:03 AM #

Mark


Hey Rishi,

Have connected the Regular Expression Transform in SSIS 2008. SSIS executes without errors but I cant get any matches from the Transform? Whatever I put for the match expression it only offers me the original column name when I connect the next block?
When I right click and look at the advanced Editor there are no Outputs available under Matched Rows Output

Am sure I'm missing something simple. Please help...

Mark United States

2/16/2011 3:47:57 PM #

Chris

This is extremely useful; thank you so much for providing it. If you are looking for feature requests, it would be very nice to have an extra column with the failed rows that is the reason why they failed.

Thanks again!

Chris United States

2/21/2011 9:51:29 AM #

Darren

Chris, I'm not sure how this extra column would work. The regular expression evaluator just gives a Boolean result which is used to direct rows to the matched or non-matched output. There is nothing available that says why; they just didn't match the expression.

Mark, this transform doesn't change any data, so your comment about only seeing the original column seems a bit confusing. By next block I assume you mean a downstream component. This transform offers two outputs, rows are directed based tow one or the other depending if they match the expressions. This transform behaves is a bit like a simpler conditional split but using regular expressions instead of the SSIS expression syntax.

Darren United Kingdom

3/29/2011 9:26:35 AM #

Vandecan

This function is what I need. However the pattern is stored on a variable....
How can I pass this variable to your tools?

Tks

Vandecan Belgium

3/29/2011 9:37:59 AM #

Vandecan

Hello again,
I have tried to put the patterns directly as a first step.

Pattern is *|???|*

Here the error message :
-------------------------
[Regular Expression Transform [1271]] Error: System.ArgumentException: parsing "*|???|*" - Quantifier {x,y} following nothing.
   at System.Text.RegularExpressions.RegexParser.ScanRegex()
   at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
   at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, Boolean useCache)
   at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options)
   at Konesans.Dts.Pipeline.RegExTransform.RegExTransform.PreExecute()
   at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)
[SSIS.Pipeline] Error: component "Regular Expression Transform" (1271) failed the pre-execute phase and returned error code 0x80070057.

Vandecan Belgium

4/20/2011 3:01:25 PM #

Eric

This component is really useful and helpful.  
I have written a related SSIS component called RegExtractor that extracts matches from a regex into new columns in a data flow.  You can specify a regular expression to operate on any text column and the captured matches of that regex get output to new columns.  The SSIS component is free and open source.  You can get it at:

http://regextractor.codeplex.com

Hope this helps (I realize I'm late in my response)

Eric

Eric United States

3/26/2012 1:07:40 PM #

Erik

I have been using this component for a while now and it works fine. The only thing is that when I change the name property of the Matched Rows Output into something in Dutch the package fails. When I change it back to the default name the package runs fine again.

It seems like that somewhere in the components code these two names "Matched Rows Output" and Non-Match Rows Output" are hardcoded in (even case sensitive!)...

Cheers,
Erik

Erik Netherlands

Pingbacks and trackbacks (1)+

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading

RecentComments

Comment RSS