带有两个行终止符的批量插入 [英] BULK INSERT with two row terminators

查看:84
本文介绍了带有两个行终止符的批量插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试导入文本文件,因此结果将只是一列的单独行中的单词.例如文字:

I am trying to import a text file, so the result would be just words in a seperate rows of one column. For example a text:

'你好妈妈,

我们又见面了

应提供5条记录:

'Hello' 
'Mom,'
'we' 
'meet' 
'again'

我尝试使用 ROWTERMINATOR ='' BULK INSERT 完成此操作,但是将换行视为 terminator ,结果之一就是'Mom,we'.

I tried to accomplish this with BULK INSERT with ROWTERMINATOR = ' ', but there is a problem with treating new line as a terminator too and I get 'Mom,we' in one of the results.

据我所知,没有办法将 second ROWTEMRMINATOR 添加到 BULK INSERT (是吗?).您知道达到上述指定结果的最佳方法是什么?

From what i know, there is no way to add a second ROWTEMRMINATOR to BULK INSERT (true?). What is the best way you know to achieve the result as specified above?

该文件无法在SQL Server外部进行预处理,该方法对于数百个带有单词行的文件很有用,这些文件在不同的时间导入,而不是一次导入.

The file cannot be preprocessed outside of SQL Server and the method should be useful for hundreds of files with thausands lines of words, imported at a different times, not just once.

推荐答案

给出:

无法在SQL Server外部对文件进行预处理

The file cannot be preprocessed outside of SQL Server

选项1

为什么不使用 OPENROWSET(BULK ...)?这将允许您导入/插入(它负责行终止符),同时进行拆分(它负责字段终止符).根据是否可以创建格式文件,它应该看起来像以下其中之一:

Option 1

Why not use OPENROWSET(BULK...)? This would allow you to import/insert (which takes care of the row terminator) while at the same time splitting (which takes care of the field terminator). Depending on whether or not you can create a Format File, it should look something like one of the following:

格式化文件=拆分每一行

INSERT INTO dbo.TableName (ColumnName)
  SELECT split.SplitVal
  FROM   OPENROWSET(BULK 'path\to\file.txt',
                    FORMATFILE='Path\to\FormatFile.XML') data(eachrows)
  CROSS APPLY SQL#.String_Split(data.eachrow, N' ', 2) split;

无格式文件=将整个文件分成一行

INSERT INTO dbo.TableName (ColumnName)
  SELECT split.SplitVal
  FROM   OPENROWSET(BULK 'path\to\file.txt', SINGLE_CLOB) data(allrows)
  CROSS APPLY SQL#.String_Split(
                                REPLACE(data.allrows, NCHAR(10), N' '),
                                N' ',
                                2 -- remove empty entries
                               ) split;

注意:

  • 对于这两种方法,您都需要使用字符串分隔符.基于SQLCLR的拆分器是最快的,在上面的示例中,我使用了 SQL#库(创建,但 String_Split 函数在免费版本中可用).您也可以编写自己的.如果您确实编写自己的文件,并且不是使用格式文件 ,那么允许使用多个分割字符可能是个好主意,这样您就可以同时传递"和"\ n"并摆脱 REPLACE().

  • For both methods you need to use a string splitter. SQLCLR-based splitters are the fastest and in the examples above I used one from the SQL# library (which I created but the String_Split function is available in the Free version). You can also write your own. If you do write your own and are not using a Format File, it might be a good idea to allow for multiple split characters so you can pass in both " " and "\n" and get rid of the REPLACE().

如果您可以编写自己的SQLCLR字符串拆分器,则最好编写一个接受 @FilePath 输入参数,读取文件并执行操作的SQLCLR存储过程.拆分,然后将单词吐出一列中的尽可能多的行:

If you can write your own SQLCLR string splitter, then it might be even better to just write a SQLCLR stored procedure that accepts an input parameter for @FilePath, reads the file, does the splitting, and spits out the words as many rows of a single column:

INSERT INTO dbo.TableName(ColumnName)
  EXEC dbo.MySQLCLRproc(N'C:\path\to\file.txt');

  • 如果您不使用(或无法使用)格式文件,请确保使用正确的"SINGLE_"选项,因为您可以执行 SINGLE_CLOB (返回 VARCHAR(MAX)(用于标准ASCII文件)或 SINGLE_NCLOB (对于Unicode文件返回 NVARCHAR(MAX)).

  • If you are not using (or cannot use) a Format File, then be sure to use the proper "SINGLE_" option as you can do either SINGLE_CLOB (returns VARCHAR(MAX) for standard ASCII file) or SINGLE_NCLOB (returns NVARCHAR(MAX) for Unicode file).

    即使您可以创建格式文件,也可以 更有效地将整个文件作为单个字符串拉入,具体取决于文件的大小,例如将大文件分开字符串可以相当快地完成,并且将是单个函数调用,而成千上万的短行的文件将是成千上万的函数调用,它们也很快,但可能不会比单个调用快1000倍.但是,如果文件为1 MB或更大,那么我可能仍会选择执行格式化文件"并处理尽可能多的短行.

    Even if you can create a Format File, it might be more efficient to pull in the entire file as a single string, depending on the size of the files, as splitting a large string can be done rather quickly and would be a single function call, whereas a file of thousands of short lines would be thousands of function calls that are also fast, but likely not 1000 times faster than the single call. But if the file is 1 MB or larger then I would probably still opt for doing the Format File and processing as many short lines.

    如果通过预处理"是指更改,但是对简单地读取它们并从SQL Server外部的内容中插入数据没有限制,则应编写一个小型.NET应用程序来读取行,将行拆分.行,并通过调用接受表值参数(TVP)的存储过程来插入数据.我在S.O.的另一个答案中详细介绍了这种方法:

    If by "pre-processed" you mean altered, but that there is no restriction on simply reading them and inserting the data from something external to SQL Server, you should write a small .NET app which reads the rows, splits the lines, and inserts the data by calling a Stored Procedure that accepts a Table-Valued Parameter (TVP). I have detailed this approach in another answer here on S.O.:

    我如何在最短的时间内插入一千万条记录?

    它可以被编译为控制台应用程序,并可以批量(即 .CMD / .BAT )脚本使用,甚至可以安排为Windows任务.

    This could be compiled as a Console App and used in batch (i.e. .CMD / .BAT) scripts and even scheduled as a Windows Task.

    这篇关于带有两个行终止符的批量插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆