Python中的CSV,引号中包含换行符 [英] CSVs in Python with newline in quotes

查看:88
本文介绍了Python中的CSV,引号中包含换行符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到的印象是这是一个常见问题,我在域中有一个带换行符的csv文件.我正在寻找Python内的修复程序-如果可能的话,并在csv模块内.

I get the impression that this is a common problem, I have a csv file with newlines within the fields. I am looking for a fix within Python--and within the csv module if possible.

这是我创建的示例文件

$ more test_csv.csv
a,"b",c,d,"e
e
e",f
a,bb,c,d,ee ,"f
f"
a,b,"c
c",d,e,f

并非所有字段都用引号引起来(尽管在此示例中,我的用法是随机的,但实际文件应匹配quoting = csv.QUOTE_MINIMAL)

Not all fields will be wrapped in quotes (although my usage is random in this example, the actual file should match quoting=csv.QUOTE_MINIMAL)

输出应类似于

[[a,b,c,d,"e\ne\ne",f],[a,bb,c,d,ee,"f\nf"][a,b,"c\nc",d,e,f]]

相反,我得到

[[['a', 'b', 'c', 'd', 'e\n']], [['e']], [['e"', 'f']], [['a', 'bb', 'c', 'd', 'ee ', 'f\n']], [['f"']], [['a', 'b', 'c\n']], [['c"', 'd', 'e', 'f']]]

请注意行和列的数量.另一个需要注意的是,在三分之二行中,当不应该包含引号时会包含该引号.

Please focus on the amount of rows and columns. Another concern is that in the thirds row, a quote was included when it should not have been.

到目前为止,这是我的代码:

Here is my code so far:

导入csv

file = open('test_csv.csv', 'r')
rows = []
for line in file:
  fields = []  
  mycsv = csv.reader([line], dialect='excel', \
    quotechar='"', quoting=csv.QUOTE_MINIMAL)
  for field in mycsv:
    fields.append(field)
  rows.append(fields)

谢谢.

推荐答案

让自己代替行,而不要让代码分开 csv.reader :

Instead of splitting the lines yourself, let csv.reader do it:

>>> from StringIO import StringIO
>>> import csv
>>> file = StringIO("""a,"b",c,d,"e
e
e",f
a,bb,c,d,ee ,"f
f"
a,b,"c
c",d,e,f""")
>>> for line in csv.reader(file):
    print line

['a', 'b', 'c', 'd', 'e\ne\ne', 'f']
['a', 'bb', 'c', 'd', 'ee ', 'f\nf']
['a', 'b', 'c\nc', 'd', 'e', 'f']

进一步的解释:通过自己遍历行,并为每行创建一个读取,您在逻辑上将文件视为每行是一个单独且完整的csv文件.相反,您要将整个文件视为一个csv文档.您可以通过将文件对象传递到 csv.reader 中来执行此操作,因为遍历文件对象会遍历文件的行,或者您自己可以读取文件,然后以换行符分隔行,然后将拆分行的列表全部传递到一个 csv.reader .

Further explanation: By looping over the lines yourself, and creating a read for each line, you are logically treating the file as if each line was a separate and complete csv file. Instead, you want to treat the whole file as a csv document. You can either do this by passing the file object into csv.reader, since iterating over a file object iterates over the lines of the file, or reading the file yourself, splitting the lines by newlines, and then passing in the list of all the split lines into one csv.reader.

这篇关于Python中的CSV,引号中包含换行符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆