pandas 中的编码错误read_csv [英] Encoding Error in Panda read_csv

查看:115
本文介绍了 pandas 中的编码错误read_csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将CS​​V文件读入Pandas中的数据框.当我尝试这样做时,出现以下错误:

I'm attempting to read a CSV file into a Dataframe in Pandas. When I try to do that, I get the following error:

UnicodeDecodeError:'utf-8'编解码器无法解码位置55处的字节0x96:无效的起始字节

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 55: invalid start byte

这来自代码:

import pandas as pd

location = r"C:\Users\khtad\Documents\test.csv"

df = pd.read_csv(location, header=0, quotechar='"')

这是在Windows 7 Enterprise Service Pack 1计算机上,它似乎适用于我创建的每个CSV文件.在此特定情况下,如果重要的话,位置55的二进制为00101001,位置54的为01110011.

This is on a Windows 7 Enterprise Service Pack 1 machine and it seems to apply to every CSV file I create. In this particular case the binary from location 55 is 00101001 and location 54 is 01110011, if that matters.

使用文本编辑器将文件另存为UTF-8似乎也无济于事.同样,添加参数"encoding ='utf-8"也不起作用,它会返回相同的错误.

Saving the file as UTF-8 with a text editor doesn't seem to help, either. Similarly, adding the param "encoding='utf-8' doesn't work, either--it returns the same error.

此错误最可能的原因是什么?除了暂时放弃DataFrame构造并使用csv模块逐行读取CSV以外,是否有其他解决方法?

What is the most likely cause of this error and are there any workarounds other than abandoning the DataFrame construct for the moment and using the csv module to read in the CSV line-by-line?

推荐答案

尝试使用encoding='latin1'encoding='iso-8859-1'encoding='cp1252'(这些是Windows上的各种编码中的某些)来调用read_csv.

Try calling read_csv with encoding='latin1', encoding='iso-8859-1' or encoding='cp1252' (these are some of the various encodings found on Windows).

这篇关于 pandas 中的编码错误read_csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆