Python CSV 错误:行包含 NULL 字节 [英] Python CSV error: line contains NULL byte

查看:23
本文介绍了Python CSV 错误:行包含 NULL 字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一些 CSV 文件,代码如下:

I'm working with some CSV files, with the following code:

reader = csv.reader(open(filepath, "rU"))
try:
    for row in reader:
        print 'Row read successfully!', row
except csv.Error, e:
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))

一个文件抛出这个错误:

And one file is throwing this error:

file my.csv, line 1: line contains NULL byte

我能做什么?谷歌似乎暗示它可能是一个被不正确地保存为 .csv 的 Excel 文件.有什么办法可以在 Python 中解决这个问题?

What can I do? Google seems to suggest that it may be an Excel file that's been saved as a .csv improperly. Is there any way I can get round this problem in Python?

== 更新 ==

在下面@JohnMachin 的评论之后,我尝试将这些行添加到我的脚本中:

Following @JohnMachin's comment below, I tried adding these lines to my script:

print repr(open(filepath, 'rb').read(200)) # dump 1st 200 bytes of file
data = open(filepath, 'rb').read()
print data.find('x00')
print data.count('x00')

这是我得到的输出:

'xd0xcfx11xe0xa1xb1x1axe1x00x00x00x00x00x00x00x00 .... <snip>
8
13834

所以该文件确实包含 NUL 字节.

So the file does indeed contain NUL bytes.

推荐答案

正如@S.Lott 所说,您应该以rb"模式而不是rU"模式打开文件.但是,这可能不会导致您当前的问题.据我所知,如果数据中嵌入了 ,使用 'rU' 模式会让你一团糟,但不会导致任何其他戏剧.我还注意到您有几个文件(都用 'rU' ?? 打开),但只有一个导致了问题.

As @S.Lott says, you should be opening your files in 'rb' mode, not 'rU' mode. However that may NOT be causing your current problem. As far as I know, using 'rU' mode would mess you up if there are embedded in the data, but not cause any other dramas. I also note that you have several files (all opened with 'rU' ??) but only one causing a problem.

如果 csv 模块说您的文件中有一个NULL"(愚蠢的消息,应该是NUL")字节,那么您需要检查文件中的内容.我建议您这样做,即使使用 'rb' 会使问题消失.

If the csv module says that you have a "NULL" (silly message, should be "NUL") byte in your file, then you need to check out what is in your file. I would suggest that you do this even if using 'rb' makes the problem go away.

repr() 是(或想成为)您的调试朋友.它将以独立于平台的方式明确显示您拥有的内容(这对不知道 od 是什么或做什么的帮助者很有帮助).这样做:

repr() is (or wants to be) your debugging friend. It will show unambiguously what you've got, in a platform independant fashion (which is helpful to helpers who are unaware what od is or does). Do this:

print repr(open('my.csv', 'rb').read(200)) # dump 1st 200 bytes of file

并小心地将结果复制/粘贴(不要重新键入)到您的问题的编辑中(而不是评论中).

and carefully copy/paste (don't retype) the result into an edit of your question (not into a comment).

另请注意,如果文件确实不可靠,例如在距文件开头的合理距离内没有 或 ,reader.line_num 报告的行号将是(无益的)1. 找到第一个 x00 是(如果有的话)做

Also note that if the file is really dodgy e.g. no or within reasonable distance from the start of the file, the line number reported by reader.line_num will be (unhelpfully) 1. Find where the first x00 is (if any) by doing

data = open('my.csv', 'rb').read()
print data.find('x00')

并确保使用 repr 或 od 至少转储那么多字节.

and make sure that you dump at least that many bytes with repr or od.

data.count('x00') 告诉你什么?如果有很多,你可能想要做类似的事情

What does data.count('x00') tell you? If there are many, you may want to do something like

for i, c in enumerate(data):
    if c == 'x00':
        print i, repr(data[i-30:i]) + ' *NUL* ' + repr(data[i+1:i+31])

以便您可以在上下文中看到 NUL 字节.

so that you can see the NUL bytes in context.

如果你能在输出中看到 x00(或者在你的 od -c 输出中看到 ),那么你肯定有 NUL文件中的字节,您需要执行以下操作:

If you can see x00 in the output (or in your od -c output), then you definitely have NUL byte(s) in the file, and you will need to do something like this:

fi = open('my.csv', 'rb')
data = fi.read()
fi.close()
fo = open('mynew.csv', 'wb')
fo.write(data.replace('x00', ''))
fo.close()

顺便问一下,你有没有用文本编辑器查看过文件(包括最后几行)?它实际上看起来像其他(没有NULL 字节"异常)文件一样合理的 CSV 文件吗?

By the way, have you looked at the file (including the last few lines) with a text editor? Does it actually look like a reasonable CSV file like the other (no "NULL byte" exception) files?

这篇关于Python CSV 错误:行包含 NULL 字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆