导入CSV引用错误正在推动我坚果 [英] Importing CSV quoting error is driving me nuts

查看:175
本文介绍了导入CSV引用错误正在推动我坚果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我尝试在ruby-1.9.2中导入CSV文件时遇到了难以置信的时间。我尝试解析的文件有:




  • 栏中的逗号

  • 栏中的引号

  • '@'as:col_sep



csv.txt(代表性输入,实际值为101k行):

 㔾@㔾@jié@seal汉字,(Kangxi radical 26)

我的代码:

  require'csv'

CSV.foreach(/ Users / adam / Desktop / csvtest.txt,{:col_sep =>@})do | row |
puts row.to_s
end

我所需的输出:

  [㔾,㔾,jié,\seal\ )] 

我得到的输出:

  CSV :: MalformedCSVError:未封闭的行1上的引用字段。
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby /1.9.1/CSV.rb:1910:in`block in shift'
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV。 rb:1825:in'loop'
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in`shift'
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1767:in`each'
从/ Users / adam / .rvm / rubies / ruby​​-1.9.2-p290 / lib / ruby​​ / 1.9.1 / CSV.rb:1202:在`block in foreach'
从/Users/adam/.rvm/rubies/ruby- 1.9.2-p290 / lib / ruby​​ / 1.9.1 / CSV.rb:1340:在'open'中
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby /1.9.1/CSV.rb:1201:in`foreach'
from(irb):31
从/Users/adam/.rvm/rubies/ruby-1.9.2-p290/bin/ irb:16:in`< main>'

它说有未关闭的引号,可以看到引号打开和关闭。



转义引号什么都不做。我得到相同的错误( ... @sealr ... )。
将它们改为单引号使它工作( ... @'seal'r ... )。



任何想法?

解决方案

我认为问题是CSV试图解释seal作为单引号列;但它不会显示为 @seal@ ,因此解析器会混淆,因为引用应该围绕列。我没有看到任何选项告诉CSV,列不引用,但你可以通过设置:quote_char 到不会发生的事情。如果您使用的是UTF-8,那么您可以安全地使用零字节作为永远不会出现的引号字符:

  CSV.foreach(filename,:col_sep =>@,:quote_char =>\x00)do | row | 
#...
end

您的列被引用。


I've been having an unbelievable time trying to import a CSV file in ruby-1.9.2.

The file I am trying to parse has:

  • commas within columns
  • quotes within columns
  • uses an '@' as the :col_sep

csv.txt (representative input, real one is 101k lines):

㔾@㔾@jié@"seal" radical in Chinese characters, (Kangxi radical 26)

My code:

require 'csv'

CSV.foreach("/Users/adam/Desktop/csvtest.txt", {:col_sep => "@"}) do |row|
    puts row.to_s 
end

My desired output:

["㔾", "㔾", "jié", "\"seal\" radical in Chinese characters, (Kangxi radical 26)"]

What I get for output:

CSV::MalformedCSVError: Unclosed quoted field on line 1.
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1910:in `block in shift'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `loop'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1825:in `shift'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1767:in `each'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1202:in `block in foreach'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1340:in `open'
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/1.9.1/CSV.rb:1201:in `foreach'
from (irb):31
from /Users/adam/.rvm/rubies/ruby-1.9.2-p290/bin/irb:16:in `<main>'

It says there are unclosed quoted feilds, but I can see that the quotes open and close.

Escaping the quotes does nothing. I get the same error (...@""seal"" r...). Changing them to single quotes makes it work (...@'seal' r...). The problem is I NEED them to be in double quotes.

Any ideas?

解决方案

I think the problem is that CSV is trying to interpret "seal" as a single quoted column; but, it doesn't appear as @"seal"@ so the parser gets confused because quotes are supposed to surround columns. I don't see any option to tell CSV that the columns aren't quoted but you can kludge around it by setting :quote_char to something that will never occur. If you're using UTF-8 then you can safely use a zero byte as your "quote character that will never occur":

CSV.foreach(filename, :col_sep => "@", :quote_char => "\x00") do |row|
    #...
end

This should work as long as none of your columns are quoted.

这篇关于导入CSV引用错误正在推动我坚果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆