将最新的csv文件导入ssis中的sql server [英] Import most recent csv file to sql server in ssis
问题描述
我有一个文件夹,我在其中每半小时收到一个带有时间戳的 .csv 文件.现在,我需要从可用文件中获取最新文件并将其导入 sql server.
i have an folder, in which i receive .csv files for every half hour with time stamps. Now, i need to take the latest file from the available files and import it into sql server.
例如
在我的源文件夹中,我有
in my source folder, i have
test_01112012_120122.csv
test_01112012_123022.csv
test_01112012_123555.csv
test_01112012_120122.csv
test_01112012_123022.csv
test_01112012_123555.csv
现在我需要在 SSIS 的帮助下获取最新的文件并将该文件导入 sql server.
now i need to fetch the latest file and import that file into sql server with the help of SSIS.
谢谢
萨蒂斯
推荐答案
即使您使用 SSIS 作为导入工具,也将需要来自 @garry Vass 或类似代码的代码.
The code from @garry Vass, or one like it, is going to be needed even if you're using SSIS as your import tool.
在 SSIS 中,您需要将连接字符串更新到平面文件连接管理器以指向新文件.因此,您需要确定最新的文件是什么.
Within SSIS, you will need to update the connection string to your flat file connection manager to point to the new file. Ergo, you need to determine what is the most recent file.
您是通过文件属性(Garry 的代码)还是对文件名进行切片和切块,将取决于您的业务规则是什么.它总是最近修改的文件(属性)还是需要基于被解释为序列的文件名.如果 test_01112012_120122.csv
有错误并且内容被更新,这很重要.修改日期会更改,但文件名不会更改,并且这些更改不会移植回数据库.
Whether you do it by file attributes (Garry's code) or slicing and dicing of file names is going to be dependent upon what your business rules are. Is it always the most recently modified file (attribute) or does it need to be based off the file name being interpreted as a sequence. This matters if the test_01112012_120122.csv
had a mistake in it and the contents are updated. The modified date will change but the file name will not and those changes wouldn't get ported back into the database.
我建议您创建 2 个字符串类型的变量,并将其作用域限定为名为 RootFolder
和 CurrentFile
的包.或者,如果您限制为诸如 *.csv
之类的特定类型,您可以创建一个名为 FileMask 的文件.RootFolder
将是您希望在 C:\ssisdata\MyProject
中查找文件的基本文件夹.CurrentFile
将从脚本的完全限定路径到最近修改的文件分配一个值.我发现此时为 CurrentFile 分配一个设计时值很有帮助,通常分配给集合中最旧的文件.
I would suggest you create 2 variables of type String and scoped to the package named RootFolder
and CurrentFile
. Optionally, you can create one called FileMask if you are restricting to a particular type like *.csv
. RootFolder
would be the base folder you expect to find files in C:\ssisdata\MyProject
. CurrentFile
will be assigned a value from a script of the fully qualified path to the most recently modified file. I find it helpful at this point to assign a design-time value to CurrentFile, usually to the oldest file in the collection.
将脚本任务拖到控制流上并设置为您的 ReadOnlyVariable User::RootFolder(可选 User::FileMask).您的 ReadWriteVariable 将是 User::CurrentFile.
Drag a Script Task onto the Control Flow and set as your ReadOnlyVariable User::RootFolder (optionally User::FileMask). Your ReadWriteVariable would be User::CurrentFile.
这个脚本将进入公共部分类ScriptMain:...
大括号
/// <summary>
/// This verbose script identifies the most recently modified file of type fileMask
/// living in RootFolder and assigns that to a DTS level variable.
/// </summary>
public void Main()
{
string fileMask = "*.csv";
string mostRecentFile = string.Empty;
string rootFolder = string.Empty;
// Assign values from the DTS variables collection.
// This is case sensitive. User:: is not required
// but you must convert it from the Object type to a strong type
rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();
// Repeat the above pattern to assign a value to fileMask if you wish
// to make it a more flexible approach
// Determine the most recent file, this could be null
System.IO.FileInfo candidate = ScriptMain.GetLatestFile(rootFolder, fileMask);
if (candidate != null)
{
mostRecentFile = candidate.FullName;
}
// Push the results back onto the variable
Dts.Variables["CurrentFile"].Value = mostRecentFile;
Dts.TaskResult = (int)ScriptResults.Success;
}
/// <summary>
/// Find the most recent file matching a pattern
/// </summary>
/// <param name="directoryName">Folder to begin searching in</param>
/// <param name="fileExtension">Extension to search, e.g. *.csv</param>
/// <returns></returns>
private static System.IO.FileInfo GetLatestFile(string directoryName, string fileExtension)
{
System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(directoryName);
System.IO.FileInfo mostRecent = null;
// Change the SearchOption to AllDirectories if you need to search subfolders
System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileExtension, System.IO.SearchOption.TopDirectoryOnly);
foreach (System.IO.FileInfo current in legacyArray)
{
if (mostRecent == null)
{
mostRecent = current;
}
if (current.LastWriteTimeUtc >= mostRecent.LastWriteTimeUtc)
{
mostRecent = current;
}
}
return mostRecent;
// To make the below code work, you'd need to edit the properties of the project
// change the TargetFramework to probably 3.5 or 4. Not sure
// Current error is the OrderByDescending doesn't exist for 2.0 framework
//return directoryInfo.GetFiles(fileExtension)
// .OrderByDescending(q => q.LastWriteTimeUtc)
// .FirstOrDefault();
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
更新连接管理器
此时,我们的脚本已为 CurrentFile 变量赋值.下一步是告诉 SSIS 我们需要使用该文件.在 CSV 的连接管理器中,您需要为 ConnectionString 设置表达式(F4 或右键单击并选择属性).您要分配的值是我们的 CurrentFile 变量,其表达方式是 @[User::CurrentFile]
最后,这些屏幕截图基于即将发布的 SQL Server 2012,因此图标可能会有所不同,但功能保持不变.
Finally, these screen shots are based on the upcoming release of SQL Server 2012 so the icons may appear different but the functionality remains the same.
这篇关于将最新的csv文件导入ssis中的sql server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!