'utf-8'编解码器不能解码字节0x89 [英] 'utf-8' codec can't decode byte 0x89
问题描述
我想读取csv文件并处理一些列,但我一直遇到问题。
存在以下错误:
I want to read a csv file and process some columns but I keep getting issues. Stuck with the following error:
Traceback (most recent call last):
File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module>
for row in reader:
File "C:\Python34\lib\codecs.py", line 313, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte
>>>
我的代码
import csv
with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f:
reader = csv.reader(f,delimiter=';',quotechar='|')
#print(sum(1 for row in reader))
for row in reader:
print(row)
if row:
value = row[6]
value = value.replace('(', '')
value = value.replace(')', '')
value = value.replace(' ', '')
value = value.replace('.', '')
value = value.replace('0032', '0')
if len(value) > 0:
print(value + ' Length: ' + str(len(value)))
<
I'm a beginner with Python, tried googling, but hard to find the right solution.
任何人都可以帮助我吗?
Can anyone help me out?
推荐答案
这是最重要的线索:
/ p>
invalid start byte
\x89
-8字节。它是完全有效的继续字节。意味着如果它遵循正确的字节值,它会正确编码UTF-8:
\x89
is not, as suggested in the comments, an invalid UTF-8 byte. It is a completely valid continuation byte. Meaning if it follows the correct byte value, it codes UTF-8 correctly:
http://hexutf8.com/?q=0xc90x89
因此,您(1)没有UTF-8数据您期望,或(2)您有一些格式错误的UTF-8数据。 Python编解码器只是让你知道它在序列中错误的顺序遇到了 \x89
。
So either you (1) do not have UTF-8 data as you expect, or (2) you have some malformed UTF-8 data. The Python codec is simply letting you know that it encountered \x89
in the wrong order in the sequence.
(有关继续字节的详情,请点击此处: http://en.wikipedia.org/wiki/UTF- 8#Codepage_layout )
(More on continuation bytes here: http://en.wikipedia.org/wiki/UTF-8#Codepage_layout)
这篇关于'utf-8'编解码器不能解码字节0x89的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!