Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误 [英] Python 3 CSV file giving UnicodeDecodeError: 'utf-8' codec can't decode byte error when I print

查看:30
本文介绍了Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Python 3 中有以下代码,用于打印出 csv 文件中的每一行.

I have the following code in Python 3, which is meant to print out each line in a csv file.

import csv
with open('my_file.csv', 'r', newline='') as csvfile:
    lines = csv.reader(csvfile, delimiter = ',', quotechar = '|')
    for line in lines:
        print(' '.join(line))

但是当我运行它时,它给了我这个错误:

But when I run it, it gives me this error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte

我查看了 csv 文件,结果发现如果我取出单个 ñ(顶部有波浪号的小 n),每一行都可以正常打印.

I looked through the csv file, and it turns out that if I take out a single ñ (little n with a tilde on top), every line prints out fine.

我的问题是,我已经查看了许多类似问题的不同解决方案,但我仍然不知道如何解决这个问题,解码/编码什么等.简单地取出数据中的 ñ 字符是不是一个选项.

My problem is that I've looked through a bunch of different solutions to similar problems, but I still have no idea how to fix this, what to decode/encode, etc. Simply taking out the ñ character in the data is NOT an option.

推荐答案

我们知道文件包含字节 b'x96' 因为它在错误消息中提到:

We know the file contains the byte b'x96' since it is mentioned in the error message:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 7386: invalid start byte

现在我们可以写一个小脚本来找出b'x96'是否有任何编码解码为ñ:

Now we can write a little script to find out if there are any encodings where b'x96' decodes to ñ:

import pkgutil
import encodings
import os

def all_encodings():
    modnames = set([modname for importer, modname, ispkg in pkgutil.walk_packages(
        path=[os.path.dirname(encodings.__file__)], prefix='')])
    aliases = set(encodings.aliases.aliases.values())
    return modnames.union(aliases)

text = b'x96'
for enc in all_encodings():
    try:
        msg = text.decode(enc)
    except Exception:
        continue
    if msg == 'ñ':
        print('Decoding {t} with {enc} is {m}'.format(t=text, enc=enc, m=msg))

产生的结果

Decoding b'x96' with mac_roman is ñ
Decoding b'x96' with mac_farsi is ñ
Decoding b'x96' with mac_croatian is ñ
Decoding b'x96' with mac_arabic is ñ
Decoding b'x96' with mac_romanian is ñ
Decoding b'x96' with mac_iceland is ñ
Decoding b'x96' with mac_turkish is ñ

因此,尝试更改

with open('my_file.csv', 'r', newline='') as csvfile:

到这些编码之一,例如:

to one of those encodings, such as:

with open('my_file.csv', 'r', encoding='mac_roman', newline='') as csvfile:

这篇关于Python 3 CSV 文件给出 UnicodeDecodeError: 'utf-8' 编解码器在打印时无法解码字节错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆