使csv.Sniffer使用带引号的值 [英] Getting csv.Sniffer to work with quoted values

查看:188
本文介绍了使csv.Sniffer使用带引号的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python的CSV嗅探器工具如许多StackOverflow答案中所建议的那样,猜测给定CSV文件是否由;,分隔.

I'm trying to use python's CSV sniffer tool as suggested in many StackOverflow answers to guess if a given CSV file is delimited by ; or ,.

它可以与基本文件一起正常工作,但是当一个值包含定界符时,它会被双引号引起来(按照标准),并且嗅探器会抛出_csv.Error: Could not determine delimiter.

It's working fine with basic files, but when a value contains a delimiter, it is surrounded by double quotes (as the standard goes), and the sniffer throws _csv.Error: Could not determine delimiter.

有人有过经历吗?

这是最小的CSV失败文件:

Here is a minimal failing CSV file:

column1,column2
0,"a, b"

以及概念证明:

Python 3.5.1 (default, Dec  7 2015, 12:58:09) 
[GCC 5.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> f = open("example.csv", "r")
>>> f.seek(0);
0
>>> csv.Sniffer().sniff(f.read(), delimiters=';,')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/csv.py", line 186, in sniff
    raise Error("Could not determine delimiter")
_csv.Error: Could not determine delimiter

我完全控制输入CSV文件的生成;但是有时它是由第三方使用MS Office修改的,并且分隔符被分号代替,因此我必须使用这种猜测方法. 我知道我可以停止在输入文件中使用逗号,但是我想知道我是否先做错了.

I have total control over the generation of input CSV file; but sometimes it is modified by a third party using MS Office and the delimiter is replaced by semicolumns, so I have to use this guessing approach. I know I could stop using commas in the input file, but I would like to know if I'm doing something wrong first.

推荐答案

您给嗅探器提供了太多输入.如果您运行以下示例文件,则它可以正常工作:

You are giving the sniffer too much input. Your sample file does work if you run:

csv.Sniffer().sniff(f.readline())

仅使用标题行来确定定界符.如果您想了解为何Sniffer启发式方法无法获取更多数据,那么没有任何替代方法阅读csv.py库的源代码.

which uses only the header row to determine the delimiter character. If you want to understand why the Sniffer heuristics fail for more data, there is no substitute for reading the csv.py library source code.

这篇关于使csv.Sniffer使用带引号的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆