'utf-8'编解码器不能解码字节0x89 [英] 'utf-8' codec can't decode byte 0x89

查看:2175
本文介绍了'utf-8'编解码器不能解码字节0x89的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想读取csv文件并处理一些列,但我一直遇到问题。
存在以下错误:

I want to read a csv file and process some columns but I keep getting issues. Stuck with the following error:

Traceback (most recent call last):
  File "C:\Users\Sven\Desktop\Python\read csv.py", line 5, in <module>
    for row in reader:
  File "C:\Python34\lib\codecs.py", line 313, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 446: invalid start byte
>>> 

我的代码

import csv
with open("c:\\Users\\Sven\\Desktop\\relaties 24112014.csv",newline='', encoding="utf8") as f:
    reader = csv.reader(f,delimiter=';',quotechar='|')
    #print(sum(1 for row in reader))
    for row in reader:
        print(row)
        if row:
            value = row[6]
            value = value.replace('(', '')
            value = value.replace(')', '')
            value = value.replace(' ', '')
            value = value.replace('.', '')
            value = value.replace('0032', '0')
            if len(value) > 0:
                print(value + ' Length: ' + str(len(value)))



<

I'm a beginner with Python, tried googling, but hard to find the right solution.

任何人都可以帮助我吗?

Can anyone help me out?

推荐答案

这是最重要的线索:


/ p>

invalid start byte

\x89 -8字节。它是完全有效的继续字节。意味着如果它遵循正确的字节值,它会正确编码UTF-8:

\x89 is not, as suggested in the comments, an invalid UTF-8 byte. It is a completely valid continuation byte. Meaning if it follows the correct byte value, it codes UTF-8 correctly:

http://hexutf8.com/?q=0xc90x89

因此,您(1)没有UTF-8数据您期望,或(2)您有一些格式错误的UTF-8数据。 Python编解码器只是让你知道它在序列中错误的顺序遇到了 \x89

So either you (1) do not have UTF-8 data as you expect, or (2) you have some malformed UTF-8 data. The Python codec is simply letting you know that it encountered \x89 in the wrong order in the sequence.

(有关继续字节的详情,请点击此处: http://en.wikipedia.org/wiki/UTF- 8#Codepage_layout

(More on continuation bytes here: http://en.wikipedia.org/wiki/UTF-8#Codepage_layout)

这篇关于'utf-8'编解码器不能解码字节0x89的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆