'utf-8'编解码器无法在Python3.4中解码字节读取文件,但在Python2.7中无法解码 [英] 'utf-8' codec can't decode byte reading a file in Python3.4 but not in Python2.7

查看:218
本文介绍了'utf-8'编解码器无法在Python3.4中解码字节读取文件,但在Python2.7中无法解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图在python2.7中读取文件,并且它已被完美读取.我遇到的问题是当我在Python3.4中执行相同的程序然后出现错误时:

I was trying to read a file in python2.7, and it was readen perfectly. The problem that I have is when I execute the same program in Python3.4 and then appear the error:

'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'

此外,当我在Windows(使用python3.4)中运行程序时,未出现错误.文档的第一行是: Codi;Codi_lloc_anonim;Nom

Also, when I run the program in Windows (with python3.4), the error doesn't appear. The first line of the document is: Codi;Codi_lloc_anonim;Nom

我程序的代码是:

def lectdict(filename,colkey,colvalue):
    f = open(filename,'r')
    D = dict()

    for line in f:
       if line == '\n': continue
       D[line.split(';')[colkey]] = D.get(line.split(';')[colkey],[]) + [line.split(';')[colvalue]]

f.close
return D

Traduccio = lectdict('Noms_departaments_centres.txt',1,2)

推荐答案

在Python2中,

f = open(filename,'r')
for line in f:

从文件中读取行作为字节.

在Python3中,相同的代码从文件中读取行作为字符串. Python3 字符串是Python2调用的unicode对象.这些是字节解码的 根据一些编码. Python3中的默认编码为utf-8.

In Python3, the same code reads lines from the file as strings. Python3 strings are what Python2 call unicode objects. These are bytes decoded according to some encoding. The default encoding in Python3 is utf-8.

错误消息

'utf-8' codec can't decode byte 0xf2 in position 424: invalid continuation byte'

显示Python3正在尝试将字节解码为utf-8.由于存在错误,该文件显然不包含 utf-8编码的字节.

shows Python3 is trying to decode the bytes as utf-8. Since there is an error, the file apparently does not contain utf-8 encoded bytes.

要解决此问题,您需要指定文件的正确编码:

To fix the problem you need to specify the correct encoding of the file:

with open(filename, encoding=enc) as f:
    for line in f:

如果您不知道正确的编码,则可以运行此程序以简单地 尝试使用Python已知的所有编码.如果幸运的话,会有一个 编码,将字节转换为可识别的字符.有时更多 可能会出现一种 编码方式,在这种情况下,您需要检查并 仔细比较结果.

If you do not know the correct encoding, you could run this program to simply try all the encodings known to Python. If you are lucky there will be an encoding which turns the bytes into recognizable characters. Sometimes more than one encoding may appear to work, in which case you'll need to check and compare the results carefully.

# Python3
import pkgutil
import os
import encodings

def all_encodings():
    modnames = set(
        [modname for importer, modname, ispkg in pkgutil.walk_packages(
            path=[os.path.dirname(encodings.__file__)], prefix='')])
    aliases = set(encodings.aliases.aliases.values())
    return modnames.union(aliases)

filename = '/tmp/test'
encodings = all_encodings()
for enc in encodings:
    try:
        with open(filename, encoding=enc) as f:
            # print the encoding and the first 500 characters
            print(enc, f.read(500))
    except Exception:
        pass

这篇关于'utf-8'编解码器无法在Python3.4中解码字节读取文件,但在Python2.7中无法解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆