将最新的csv文件导入ssis中的sql server [英] Import most recent csv file to sql server in ssis

查看:36
本文介绍了将最新的csv文件导入ssis中的sql server的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件夹,我在其中每半小时收到一个带有时间戳的 .csv 文件.现在,我需要从可用文件中获取最新文件并将其导入 sql server.

i have an folder, in which i receive .csv files for every half hour with time stamps. Now, i need to take the latest file from the available files and import it into sql server.

例如

在我的源文件夹中,我有

in my source folder, i have

test_01112012_120122.csv
test_01112012_123022.csv
test_01112012_123555.csv

test_01112012_120122.csv
test_01112012_123022.csv
test_01112012_123555.csv

现在我需要在 SSIS 的帮助下获取最新的文件并将该文件导入 sql server.

now i need to fetch the latest file and import that file into sql server with the help of SSIS.

谢谢
萨蒂斯

推荐答案

即使您使用 SSIS 作为导入工具,也将需要来自 @garry Vass 或类似代码的代码.

The code from @garry Vass, or one like it, is going to be needed even if you're using SSIS as your import tool.

在 SSIS 中,您需要将连接字符串更新到平面文件连接管理器以指向新文件.因此,您需要确定最新的文件是什么.

Within SSIS, you will need to update the connection string to your flat file connection manager to point to the new file. Ergo, you need to determine what is the most recent file.

您是通过文件属性(Garry 的代码)还是对文件名进行切片和切块,将取决于您的业务规则是什么.它总是最近修改的文件(属性)还是需要基于被解释为序列的文件名.如果 test_01112012_120122.csv 有错误并且内容被更新,这很重要.修改日期会更改,但文件名不会更改,并且这些更改不会移植回数据库.

Whether you do it by file attributes (Garry's code) or slicing and dicing of file names is going to be dependent upon what your business rules are. Is it always the most recently modified file (attribute) or does it need to be based off the file name being interpreted as a sequence. This matters if the test_01112012_120122.csv had a mistake in it and the contents are updated. The modified date will change but the file name will not and those changes wouldn't get ported back into the database.

我建议您创建 2 个字符串类型的变量,并将其作用域限定为名为 RootFolderCurrentFile 的包.或者,如果您限制为诸如 *.csv 之类的特定类型,您可以创建一个名为 FileMask 的文件.RootFolder 将是您希望在 C:\ssisdata\MyProject 中查找文件的基本文件夹.CurrentFile 将从脚本的完全限定路径到最近修改的文件分配一个值.我发现此时为 CurrentFile 分配一个设计时值很有帮助,通常分配给集合中最旧的文件.

I would suggest you create 2 variables of type String and scoped to the package named RootFolder and CurrentFile. Optionally, you can create one called FileMask if you are restricting to a particular type like *.csv. RootFolder would be the base folder you expect to find files in C:\ssisdata\MyProject. CurrentFile will be assigned a value from a script of the fully qualified path to the most recently modified file. I find it helpful at this point to assign a design-time value to CurrentFile, usually to the oldest file in the collection.

将脚本任务拖到控制流上并设置为您的 ReadOnlyVariable User::RootFolder(可选 User::FileMask).您的 ReadWriteVariable 将是 User::CurrentFile.

Drag a Script Task onto the Control Flow and set as your ReadOnlyVariable User::RootFolder (optionally User::FileMask). Your ReadWriteVariable would be User::CurrentFile.

这个脚本将进入公共部分类ScriptMain:...大括号

    /// <summary>
    /// This verbose script identifies the most recently modified file of type fileMask
    /// living in RootFolder and assigns that to a DTS level variable.
    /// </summary>
    public void Main()
    {
        string fileMask = "*.csv";
        string mostRecentFile = string.Empty;
        string rootFolder = string.Empty;

        // Assign values from the DTS variables collection.
        // This is case sensitive. User:: is not required
        // but you must convert it from the Object type to a strong type
        rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();

        // Repeat the above pattern to assign a value to fileMask if you wish
        // to make it a more flexible approach

        // Determine the most recent file, this could be null
        System.IO.FileInfo candidate = ScriptMain.GetLatestFile(rootFolder, fileMask);

        if (candidate != null)
        {
            mostRecentFile = candidate.FullName;
        }

        // Push the results back onto the variable
        Dts.Variables["CurrentFile"].Value = mostRecentFile;

        Dts.TaskResult = (int)ScriptResults.Success;
    }

    /// <summary>
    /// Find the most recent file matching a pattern
    /// </summary>
    /// <param name="directoryName">Folder to begin searching in</param>
    /// <param name="fileExtension">Extension to search, e.g. *.csv</param>
    /// <returns></returns>
    private static System.IO.FileInfo GetLatestFile(string directoryName, string fileExtension)
    {
        System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(directoryName);

        System.IO.FileInfo mostRecent = null;

        // Change the SearchOption to AllDirectories if you need to search subfolders
        System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileExtension, System.IO.SearchOption.TopDirectoryOnly);
        foreach (System.IO.FileInfo current in legacyArray)
        {
            if (mostRecent == null)
            {
                mostRecent = current;
            }

            if (current.LastWriteTimeUtc >= mostRecent.LastWriteTimeUtc)
            {
                mostRecent = current;
            }
        }

        return mostRecent;

        // To make the below code work, you'd need to edit the properties of the project
        // change the TargetFramework to probably 3.5 or 4. Not sure
        // Current error is the OrderByDescending doesn't exist for 2.0 framework
        //return directoryInfo.GetFiles(fileExtension)
        //     .OrderByDescending(q => q.LastWriteTimeUtc)
        //     .FirstOrDefault();
    }

    #region ScriptResults declaration
    /// <summary>
    /// This enum provides a convenient shorthand within the scope of this class for setting the
    /// result of the script.
    /// 
    /// This code was generated automatically.
    /// </summary>
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion

}

更新连接管理器

此时,我们的脚本已为 CurrentFile 变量赋值.下一步是告诉 SSIS 我们需要使用该文件.在 CSV 的连接管理器中,您需要为 ConnectionString 设置表达式(F4 或右键单击并选择属性).您要分配的值是我们的 CurrentFile 变量,其表达方式是 @[User::CurrentFile]

最后,这些屏幕截图基于即将发布的 SQL Server 2012,因此图标可能会有所不同,但功能保持不变.

Finally, these screen shots are based on the upcoming release of SQL Server 2012 so the icons may appear different but the functionality remains the same.

这篇关于将最新的csv文件导入ssis中的sql server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆