SQL Server大量插入具有不一致引号的CSV文件 [英] SQL Server Bulk insert of CSV file with inconsistent quotes
问题描述
是否可以BULK INSERT(SQL Server)一个CSV文件,其中字段只有OCCASSIONALLY包围的引号?具体来说,引号只包含包含,的字段。
换句话说,我的数据看起来像这样(第一行包含标题) p>
id,company,rep,employees
pre>
729216,INGRAM MICRO INC。,Stuart,Becky,523
729235,GREAT PLAINS ENERGY,INC。,Nelson,Beena,114
721177,GEORGE WESTON BAKERIES INC,Hogan,Meg,253
因为引号不一致,我不能使用,'作为分隔符,我不知道如何创建一个格式文件
我尝试使用','作为分隔符,并将其加载到临时表中,其中每个列都是varchar,然后使用一些kludgy处理但是这也不行,因为包含,的字段被拆分成多个列。
不幸的是,我没有能力
非常感谢您的任何建议。
/ p>
顺便说一句,我看到这篇文章 SQL批量导入从csv ,但在这种情况下,每个字段一致地包装在引号中。
解决方案您可以使用','作为分隔符,需要预处理文件,周期。
如果真的真的需要这样做,这里是代码。我写这个,因为我绝对没有选择。它是实用代码,我不为它感到自豪,但它的工作原理。方法不是让SQL去理解引用的字段,而是操作文件使用完全不同的分隔符。
编辑:这是github仓库中的代码。它已经改进,现在带有单元测试! https://github.com/chrisclark/Redelim-it
此函数接受一个输入文件,并将使用新的分隔符替换所有字段分隔逗号(引号引起的文本字段中的逗号,只是实际的分隔符)。然后可以告诉sql server使用新的字段分隔符而不是逗号。在此处的函数版本中,占位符是 TMP >(我确信这不会出现在原始csv中 - 如果有的话,支持爆炸)。
因此,运行此函数后,您可以通过执行以下操作在sql中导入:
BULK INSERT MyTable
FROM'C:\ FileCreatedFromThisFunction.csv'
WITH
(
FIELDTERMINATOR ='< * TMP *>',
ROWTERMINATOR ='\\\
'
)
不用多说了,我提前道歉的可怕,可怕的功能(编辑 - 我发布了一个工作程序,而不是只是函数在我的博客):
私有函数CsvToOtherDelimiter (ByVal InputFile As String,ByVal OutputFile As String)As Integer
Dim PH1 As String =< * TMP *>
Dim objReader As StreamReader = Nothing
Dim count As Integer = 0'这也将作为主键'
Dim sb As New System.Text.StringBuilder
Try
objReader =新的StreamReader(File.OpenRead(InputFile),System.Text.Encoding.Default)
Catch ex As Exception
UpdateStatus(ex.Message)
结束尝试
如果objReader不为空那么
UpdateStatus(Invalid file:& InputFile)
count = -1
退出函数
End If
'获取第一行
Dim line = reader.ReadLine()
'并前进到下一行b / c第一行是列标题
如果hasHeaders那么
line = Trim(reader.ReadLine)
结束如果
同时不String.IsNullOrEmpty(行)循环通过每一行
count + = 1
'用我们定制的分隔符替换逗号
line = line.Replace(,,ph1)
'部分行,可以合法地包含逗号。
'在这种情况下,我们需要标识引用的部分,并为我们的自定义占位符交换逗号。
Dim starti = line.IndexOf(ph1&,0)
如果line.IndexOf(,0)= 0,那么starti = 0
$ b b当starti> -1'循环通过引用的字段
Dim FieldTerminatorFound As Boolean = False
'查找结束引号令牌(最初为,)
Dim endi As Integer = line .IndexOf(& ph1,starti)
如果endi <0则
FieldTerminatorFound = True
如果endi< 0则endi = line.Length- 1
End If
While Not FieldTerminatorFound
'查找任何更多的引号,如果有的话
Dim backChar As String = ''这是一个引号
Dim quoteCount = 0
当backChar =
quoteCount + = 1
backChar = line.Chars(endi - quoteCount)
End While
如果quoteCount Mod 2 = 1那么'奇数引号。真实字段终止符
FieldTerminatorFound = True
否则保持查找
endi = line .IndexOf(& ph1,endi + 1)
结束如果
结束而
从行中获取引用的字段,现在我们有开始和结束索引
Dim source = line.Substring(starti + ph1.Length,endi - starti - ph1.Length + 1)
'并将逗号重新换回
line = line.Replace(source,source。替换(ph1,,))
'查找下一个引用的字段
'如果endi> = line.Length - 1 then endi = line.Length'线长缩短,因此线末尾的endi值将失败
starti = line.IndexOf(ph1&,starti + ph1.Length)
End While
line = objReader.ReadLine
End While
objReader.Close()
SaveTextToFile(sb.ToString, OutputFile)
返回计数
结束函数
Is it possible to BULK INSERT (SQL Server) a CSV file in which the fields are only OCCASSIONALLY surrounded by quotes? Specifically, quotes only surround those fields that contain a ",".
In other words, I have data that looks like this (the first row contain headers):
id, company, rep, employees 729216,INGRAM MICRO INC.,"Stuart, Becky",523 729235,"GREAT PLAINS ENERGY, INC.","Nelson, Beena",114 721177,GEORGE WESTON BAKERIES INC,"Hogan, Meg",253
Because the quotes aren't consistent, I can't use '","' as a delimiter, and I don't know how to create a format file that accounts for this.
I tried using ',' as a delimter and loading it into a temporary table where every column is a varchar, then using some kludgy processing to strip out the quotes, but that doesn't work either, because the fields that contain ',' are split into multiple columns.
Unfortunately, I don't have the ability to manipulate the CSV file beforehand.
Is this hopeless?
Many thanks in advance for any advice.
By the way, i saw this post SQL bulk import from csv, but in that case, EVERY field was consistently wrapped in quotes. So, in that case, he could use ',' as a delimiter, then strip out the quotes afterwards.
解决方案You are going to need to preprocess the file, period.
If you really really need to do this, here is the code. I wrote this because I absolutely had no choice. It is utility code and I'm not proud of it, but it works. The approach is not to get SQL to understand quoted fields, but instead manipulate the file to use an entirely different delimiter.
EDIT: Here is the code in a github repo. It's been improved and now comes with unit tests! https://github.com/chrisclark/Redelim-it
This function takes an input file and will replace all field-delimiting commas (NOT commas inside quoted-text fields, just the actual delimiting ones) with a new delimiter. You can then tell sql server to use the new field delimiter instead of a comma. In the version of the function here, the placeholder is <TMP> (I feel confident this will not appear in the original csv - if it does, brace for explosions).
Therefore after running this function you import in sql by doing something like:
BULK INSERT MyTable FROM 'C:\FileCreatedFromThisFunction.csv' WITH ( FIELDTERMINATOR = '<*TMP*>', ROWTERMINATOR = '\n' )
And without further ado, the terrible, awful function that I apologize in advance for inflicting on you (edit - I've posted a working program that does this instead of just the function on my blog here):
Private Function CsvToOtherDelimiter(ByVal InputFile As String, ByVal OutputFile As String) As Integer Dim PH1 As String = "<*TMP*>" Dim objReader As StreamReader = Nothing Dim count As Integer = 0 'This will also serve as a primary key' Dim sb As New System.Text.StringBuilder Try objReader = New StreamReader(File.OpenRead(InputFile), System.Text.Encoding.Default) Catch ex As Exception UpdateStatus(ex.Message) End Try If objReader Is Nothing Then UpdateStatus("Invalid file: " & InputFile) count = -1 Exit Function End If 'grab the first line Dim line = reader.ReadLine() 'and advance to the next line b/c the first line is column headings If hasHeaders Then line = Trim(reader.ReadLine) End If While Not String.IsNullOrEmpty(line) 'loop through each line count += 1 'Replace commas with our custom-made delimiter line = line.Replace(",", ph1) 'Find a quoted part of the line, which could legitimately contain commas. 'In that case we will need to identify the quoted section and swap commas back in for our custom placeholder. Dim starti = line.IndexOf(ph1 & """", 0) If line.IndexOf("""",0) = 0 then starti=0 While starti > -1 'loop through quoted fields Dim FieldTerminatorFound As Boolean = False 'Find end quote token (originally a ",) Dim endi As Integer = line.IndexOf("""" & ph1, starti) If endi < 0 Then FieldTerminatorFound = True If endi < 0 Then endi = line.Length - 1 End If While Not FieldTerminatorFound 'Find any more quotes that are part of that sequence, if any Dim backChar As String = """" 'thats one quote Dim quoteCount = 0 While backChar = """" quoteCount += 1 backChar = line.Chars(endi - quoteCount) End While If quoteCount Mod 2 = 1 Then 'odd number of quotes. real field terminator FieldTerminatorFound = True Else 'keep looking endi = line.IndexOf("""" & ph1, endi + 1) End If End While 'Grab the quoted field from the line, now that we have the start and ending indices Dim source = line.Substring(starti + ph1.Length, endi - starti - ph1.Length + 1) 'And swap the commas back in line = line.Replace(source, source.Replace(ph1, ",")) 'Find the next quoted field ' If endi >= line.Length - 1 Then endi = line.Length 'During the swap, the length of line shrinks so an endi value at the end of the line will fail starti = line.IndexOf(ph1 & """", starti + ph1.Length) End While line = objReader.ReadLine End While objReader.Close() SaveTextToFile(sb.ToString, OutputFile) Return count End Function
这篇关于SQL Server大量插入具有不一致引号的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!