Downloading a file over HTTP the SSIS way

by Darren Green 2 Jul 2009 18:14

This post shows you how to download files from a web site whilst really making the most of the SSIS objects that are available. There is no task to do this, so we have to use the Script Task and some simple VB.NET or C# (if you have SQL Server 2008) code. Very often I see suggestions about how to use the .NET class System.Net.WebClient and of course this works, you can code pretty much anything you like in .NET. Here I’d just like to raise the profile of an alternative.

This approach uses the HTTP Connection Manager, one of the stock connection managers, so you can use configurations and property expressions in the same way you would for all other connections. Settings like the security details that you would want to make configurable already are, but if you take the .NET route you have to write quite a lot of code to manage those values via package variables. Using the connection manager we get all of that flexibility for free. The screenshot below illustrate some of the options we have.

HTTPConnectionManagerEditor

Using the HttpClientConnection class makes for much simpler code as well. I have demonstrated two methods, DownloadFile which just downloads a file to disk, and DownloadData which downloads the file and retains it in memory. In each case we show a message box to note the completion of the download. You can download a sample package below, but first the code:

Imports System
Imports System.IO
Imports System.Text
Imports System.Windows.Forms
Imports Microsoft.SqlServer.Dts.Runtime

Public Class ScriptMain

    Public Sub Main()

        ' Get the unmanaged connection object, from the connection manager called "HTTP Connection Manager"
        Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)

        ' Create a new HTTP client connection
        Dim connection As New HttpClientConnection(nativeObject)


        ' Download the file #1
        ' Save the file from the connection manager to the local path specified
        Dim filename As String = "C:\Temp\Sample.txt"
        connection.DownloadFile(filename, True)

        ' Confirm file is there
        If File.Exists(filename) Then
            MessageBox.Show(String.Format("File {0} has been downloaded.", filename))
        End If


        ' Download the file #2
        ' Read the text file straight into memory
        Dim buffer As Byte() = connection.DownloadData()
        Dim data As String = Encoding.ASCII.GetString(buffer)

        ' Display the file contents
        MessageBox.Show(data)


        Dts.TaskResult = Dts.Results.Success
    End Sub

End Class

Sample Package HTTPDownload.dtsx (74KB)

Comments (31) -

7/16/2009 2:42:06 AM #

Joel Watson

Works like a dream. Thanks for making the dtsx and sample files available so I could try it out straight away!

Joel Watson United States

7/16/2009 9:21:29 PM #

PKSpence

Just out of curosity, why are you doing this:

        Dim filename As String = "C:\Temp\Sample.txt"
        connection.DownloadFile("C:\Temp\Sample.txt", True)

Seems to me that the variable containing the name of the file should be specifed as a parameter to the .DownloadFile() method:

        Dim filename As String = "C:\Temp\Sample.txt"
        connection.DownloadFile(filename, True)

Just my .02 worth

PKSpence United States

7/17/2009 2:17:02 PM #

Darren Green

PKSpence, you're quite right I meant to use the variable consistently, code updated. Thanks.

Darren Green United Kingdom

7/19/2009 3:02:03 PM #

CozyRoc

You can check an alternative based on WebClient class here: www.cozyroc.com/script/http-upload-download-task

CozyRoc United States

9/16/2009 8:35:21 PM #

Kyle

Thank you very much for the clear article, worked like a champ (except I had to change MessageBox to MsgBox).

Kyle United States

9/16/2009 10:29:58 PM #

Darren Green

Kyle, did you forget to add Imports System.Windows.Forms? MsgBox only works in VB.NET, and as far as I recall is a hangover from old VB, really just a shortcut left in the VB.NET language. If you prefer it then that's fine too.

Darren Green United Kingdom

9/22/2009 9:42:08 PM #

Casey

This worked wonderfully and was exactly what we needed.  We were able to accomplish something much faster than I thought possible using the information you supplied. Thank you.

Casey United States

9/29/2009 9:55:59 PM #

Kanthi

Hi Darren,

Your Script Works Like a  Charm.

But i Have One Question Though - How to DownLoad the Dinamically Changing File Name.
I want to Automate the Process of Downloading.
The File Name Changes Everyday. e.g with New Date.

I trying to use expressions buyt doesn't Works.

Thanks in Advance
Kanthi

Kanthi United States

11/20/2009 7:04:32 PM #

SeanO

I am gettng a couple errors. It looks like it has to do with using this package on a different machine... Since no one else has mentioned it, I bet it is because I am not familiar with dtsx... Anyone have an idea of what's going on?

Error: 2009-11-20 10:59:49.17
   Code: 0xC0016016
   Source:
   Description: Failed to decrypt protected XML node "DTS:Property" with error 0
x8009000B "Key not valid for use in specified state.". You may not be authorized
to access this information. This error occurs when there is a cryptographic err
or. Verify that the correct key is available.
End Error


and

Error: 2009-11-20 10:59:49.35
   Code: 0x00000002
   Source: Script Task
   Description: The script threw an exception: The system cannot find the path s
pecified. (Exception from HRESULT: 0x80070003)
End Error


Anyone? Help a n00b!

SeanO

11/21/2009 12:21:37 AM #

Joe

Would you be so kind to translate these to line to C#:
===============
        ' Get the unmanaged connection object, from the connection manager called "HTTP Connection Manager"
        Dim nativeObject As Object = Dts.Connections("HTTP Connection Manager").AcquireConnection(Nothing)

        ' Create a new HTTP client connection
        Dim connection As New HttpClientConnection(nativeObject)
===============
Thanks

Joe

11/26/2009 2:38:04 PM #

Rizwan

Cant we get whole directory, i mean use wild cards.

Rizwan India

12/1/2009 10:38:35 PM #

dass

Hi Darren,

Can you light me how to download gzip file from http site using authentication.
cheers
dass

dass New Zealand

12/3/2009 6:41:32 PM #

dhiva

Issue: couldn't download the file Using HTTPS:// with Certicates.

I try to use this above script and download the file file from HTTPS:// using the CERTIFICATES. It connects to the server. But It doesn't download the file. It creates the file with the Ceriticate name on it.

The text file shows "Virtual user wll0001f logged in." The wll0001f is the certificate name.

I have created the HTTP Connection manager in my SSIS with CERTIFICATE property.

Could I get any help?

dhiva United States

1/11/2010 7:18:20 PM #

HollyK

Thanks very much, works like a charm.  

HollyK United States

7/28/2010 4:01:42 PM #

apal

I am in SSIS. Where do I add this code? I have to copy the internet file to a local folder. I created the HTTP conn manager. But now whihc SSIS task will handle this code? Please help.

apal United States

7/29/2010 7:58:01 AM #

Darren

apal, the Script Task can be used to host code.

Darren United Kingdom

8/9/2010 8:33:14 PM #

Naresh Jagtiani

Need some help here,  would this script work with https and certificates.  I am getting a proxy error.
When I select Test Connection in connection manager it succeeds however it gives Proxy authentication failed.
Error: The script threw an exception: Proxy authentication failed. This error occurs when login credentials are not provided, or the credentials are incorrect.

Thanks in advance.

Naresh Jagtiani United States

8/10/2010 10:33:24 PM #

Darren

Naresh, This work fine with SSL.

Your error indicates a problem with the proxy server authentication, so I would talk to you network team.

There is a limitation with the HTTP Connection manager, it doesn't have any option to pass through the default Windows credentials which is a simple way to supply valid Windows credentials to a proxy server. If that is issue use the System.Net.WebClient approach and pass in the default credentials like this -

webClient.Credentials = System.Net.CredentialCache.DefaultCredentials

Darren United Kingdom

8/11/2010 7:35:49 PM #

Naresh Jagtiani

Hello Darren,  Thanks for your reply.  This is what I put in the script task which worked for me.

Imports System.IO
Imports System.Net
Imports System.Text
Imports System.Web
Imports System.Security.Cryptography.X509Certificates


Public Class WebRetrieve
    Public Shared Sub Main()
        Dim mycert As New System.Security.Cryptography.X509Certificates.X509Certificate _
(System.Security.Cryptography.X509Certificates.X509Certificate.CreateFromCertFile _
("C:\Certifcates\cert.cer"))
  
       Dim wr As HttpWebRequest = CType(WebRequest.Create("https://reports/reports.zip"), HttpWebRequest)
        wr.ClientCertificates.Add(mycert)

        Dim ws As HttpWebResponse = CType(wr.GetResponse(), HttpWebResponse)
        Dim str As Stream = ws.GetResponseStream()
        Dim inBuf(100000000) As Byte
        Dim bytesToRead As Integer = CInt(inBuf.Length)
        Dim bytesRead As Integer = 0
        While bytesToRead > 0
            Dim n As Integer = str.Read(inBuf, bytesRead, bytesToRead)
            If n = 0 Then
                Exit While
            End If
            bytesRead += n
            bytesToRead -= n
        End While
        Dim fstr As New FileStream("c:\New\reports.zip", FileMode.OpenOrCreate, FileAccess.Write)
        fstr.Write(inBuf, 0, bytesRead)
        str.Close()
        fstr.Close()
    End Sub
End Class

Naresh Jagtiani United States

9/23/2010 5:11:33 PM #

Ahalya

Hi Darren, Thanks for the script.  New at SSIS - this was my first script task - worked great!

Ahalya Canada

10/27/2010 6:48:06 PM #

ashok

Hi Darren,

Your Script Works Like a  Charm.

But i Have One Question Though - How to DownLoad the Dinamically Changing File Name.
I want to Automate the Process of Downloading.
The File Name Changes Everyday. e.g with New Date.

I trying to use expressions buyt doesn't Works.

Thanks in Advance
Ashok

ashok India

10/28/2010 1:53:58 PM #

Darren Green

Ashok, The file to download is controlled by the ServerURL property of the HTTP Connection Manager. You can set an expression on that property.

Darren Green United Kingdom

11/2/2010 8:07:34 PM #

ram

Need some help here,  would this script work with https with login credientials.  Thanks in advance.

ram United States

11/9/2010 1:49:09 PM #

Darren Green

Ram why not try it? The HTTP Connection Manager allows you to specify the protocol as https, and also allows you to specify credentials.

Darren Green United Kingdom

11/4/2011 4:02:29 PM #

Dinesh

I am trying to download the Excel file located in HTTP. It is downloading but when opening the file, am getting this error "excel cannot open the file becuase the file format or file extension is not valid. verify that the file has not been corrupted and that the file extension matches the format of the file"

Dinesh India

12/13/2011 8:46:48 AM #

Dinesh

Hi Darren, when am downloading the file it is only downloading 3kb out 1700kb file. I am not able to figure it out why it is downloading only 3kb file. Please help me.

Dinesh India

1/9/2012 8:48:36 PM #

Marzipan

We are trying to download documents from a sharepoint list using the HTTP connection.

Any idea why we get a 401 error when the file is a infopath document.  A xml document created with a text editor is ok, .xlsx, .docx, .png files all can be downloaded except the infopath document.

All are stored in the same "Shared Documents" library.

Marzipan United States

4/9/2012 4:33:41 PM #

Marzipan

Found the answer.  You need to add "?NoRedirect=true"  (w/o quotes) to the URL.

This gives you raw XML.  

Now onto processing XML, we are studying up on XSLT :)

Note: if you need the schema for a infopath form template (xsn):
1.  add  ".cab" to the filename (w/o quotes)
2.  right click filename and select explore
3.  copy the MySchema.xsd to another location, give it a more descriptive name.

Marzipan United States

4/12/2012 8:58:46 AM #

Swetha

When i place a package with httpconenction manager with useproxy set to false(in fact no changes done in proxy tab for this connection manager) in job and execute it, i get the error:

Connection manager "HTTPConnectionManager" This error can occur when the server does not exist, or the proxy settings are incorrect.  

--Swetha.

Swetha India

4/12/2012 10:49:38 AM #

sambit

when i am pasting this code in the script task,then it shows that dts is not defined can you just say me why this error comes and what is the solution for that error.and the script task will be used in the control flow or the data flow.

sambit United States

7/5/2012 10:52:25 PM #

Dinesh

my http link having xml data...when I download the file as xml, am getting code in xml file. Any idea how to handle this, pls...

Dinesh India

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading