Python 3:CSV文件和Unicode错误 [英] Python 3: CSV files and Unicode Error

查看:146
本文介绍了Python 3:CSV文件和Unicode错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带此标头的csv(tsv)文件

I have a csv (tsv) file with this header

"Message Name"  "Field" "Base Label"    "Base Label Update Date"    "Translated Label"  "Translated Label Update Date"  "Language"
"Message"   "subject_template"  "New Task: Assess Distribution Outcomes for ""${docNameNoLink}"", ""${docNumber}""" "8/10/16 4:17:43 PM"    "Nouvelle tâche : évaluez le résultat de la distribution de « ${docNameNoLink} »."  "2/17/14 5:09:10 AM"    "fr"

当我尝试使用此代码读取文件

When I try to read the file with this code

import csv
with open(fileName, 'r',  encoding='utf-8', errors='replace') as fdata:
    csv.register_dialect('tsv', delimiter='\t', quoting=csv.QUOTE_NONE)
    reader=csv.reader(fdata, dialect='tsv')
    try:
        for row in reader:
            print (row)
    except csv.Error as e:
        sys.exit('file{}, line {}: {}'.format(fileName, reader.line_num, e))

我收到消息错误: 文件NameFile,第1行:第1行包含NULL字节

I get the message error: file NameFile, line 1: line contains NULL byte

但是,如果我运行此代码时没有出现error ='replace | ignore'部分,则相同的代码:

However, if I run this code without the part of errors='replace|ignore', same code:

with open(fileName, 'r',  encoding='utf-8') as fdata:
    csv.register_dialect('tsv', delimiter='\t', quoting=csv.QUOTE_NONE)
    reader=csv.reader(fdata, dialect='tsv')
    try:
        for row in reader:
            print (row)
    except csv.Error as e:
        sys.exit('file {}, line {}: {}'.format(fileName, reader.line_num, e))

我收到以下消息错误:

File "csvFiles.py", line 76 in <module>
  for row in reader:
   File "c:\Python35\lib\codecs.py", line 321 in decode (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

此错误的可能原因是什么?如何纠正该错误并使脚本正常工作?

What is the possible reason of this error and how can I can correct it and make the script work?

推荐答案

您的数据不是以'utf-8'编码的,而是以'utf-16-le'或类似的编码的. 'utf-16-le'只是一个猜测.当我使用"utf-16-le"对您的数据进行编码时,会产生完全相同的错误.检查数据文件的编码.在Linux中,您可以使用诸如emacs之类的编辑器或文件"实用程序.

Your data is not encoded in 'utf-8' but in 'utf-16-le' or something similar. 'utf-16-le' is just a guess. When I encode your data with 'utf-16-le' exactly the same errors are produced. Check the encoding of your data file. In Linux you can use an editor like emacs for that or the 'file' utility.

错误消息本身告诉我们文件的第一个字节为0xff.这可能是字节顺序标记的一部分.

The error message itself tells us that the first byte of your file is 0xff. This is, potentially, part of the Byte-Order Mark.

这篇关于Python 3:CSV文件和Unicode错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆