导入CSV引用错误正在推动我坚果 [英] Importing CSV quoting error is driving me nuts
问题描述
我尝试在ruby-1.9.2中导入CSV文件时遇到了难以置信的时间。我尝试解析的文件有:
- 栏中的逗号
- 栏中的引号
- '@'as:col_sep
csv.txt(代表性输入,实际值为101k行):
㔾@㔾@jié@seal汉字,(Kangxi radical 26)
我的代码:
require'csv'
CSV.foreach(/ Users / adam / Desktop / csvtest.txt,{:col_sep =>@})do | row |
puts row.to_s
end
我所需的输出:
[㔾,㔾,jié,\seal\ )]
我得到的输出:
CSV :: MalformedCSVError:未封闭的行1上的引用字段。
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby /1.9.1/CSV.rb:1910:in`block in shift'
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV。 rb:1825:in'loop'
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in`shift'
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1767:in`each'
从/ Users / adam / .rvm / rubies / ruby-1.9.2-p290 / lib / ruby / 1.9.1 / CSV.rb:1202:在`block in foreach'
从/Users/adam/.rvm/rubies/ruby- 1.9.2-p290 / lib / ruby / 1.9.1 / CSV.rb:1340:在'open'中
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby /1.9.1/CSV.rb:1201:in`foreach'
from(irb):31
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/bin/ irb:16:in`< main>'
它说有未关闭的引号,可以看到引号打开和关闭。
转义引号什么都不做。我得到相同的错误( ... @sealr ...
)。
将它们改为单引号使它工作( ... @'seal'r ...
)。
任何想法?
我认为问题是CSV试图解释seal
作为单引号列;但它不会显示为 @seal@
,因此解析器会混淆,因为引用应该围绕列。我没有看到任何选项告诉CSV,列不引用,但你可以通过设置:quote_char
到不会发生的事情。如果您使用的是UTF-8,那么您可以安全地使用零字节作为永远不会出现的引号字符:
CSV.foreach(filename,:col_sep =>@,:quote_char =>\x00)do | row |
#...
end
您的列被引用。
I've been having an unbelievable time trying to import a CSV file in ruby-1.9.2.
The file I am trying to parse has:
- commas within columns
- quotes within columns
- uses an '@' as the :col_sep
csv.txt (representative input, real one is 101k lines):
㔾@㔾@jié@"seal" radical in Chinese characters, (Kangxi radical 26)
My code:
require 'csv'
CSV.foreach("/Users/adam/Desktop/csvtest.txt", {:col_sep => "@"}) do |row|
puts row.to_s
end
My desired output:
["㔾", "㔾", "jié", "\"seal\" radical in Chinese characters, (Kangxi radical 26)"]
What I get for output:
CSV::MalformedCSVError: Unclosed quoted field on line 1.
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1910:in `block in shift'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `loop'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `shift'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1767:in `each'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1202:in `block in foreach'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1340:in `open'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1201:in `foreach'
from (irb):31
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/bin/irb:16:in `<main>'
It says there are unclosed quoted feilds, but I can see that the quotes open and close.
Escaping the quotes does nothing. I get the same error (...@""seal"" r...
).
Changing them to single quotes makes it work (...@'seal' r...
).
The problem is I NEED them to be in double quotes.
Any ideas?
I think the problem is that CSV is trying to interpret "seal"
as a single quoted column; but, it doesn't appear as @"seal"@
so the parser gets confused because quotes are supposed to surround columns. I don't see any option to tell CSV that the columns aren't quoted but you can kludge around it by setting :quote_char
to something that will never occur. If you're using UTF-8 then you can safely use a zero byte as your "quote character that will never occur":
CSV.foreach(filename, :col_sep => "@", :quote_char => "\x00") do |row|
#...
end
This should work as long as none of your columns are quoted.
这篇关于导入CSV引用错误正在推动我坚果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!