使用pd.read_csv导入csv-无效的起始字节错误 [英] Importing csv using pd.read_csv - invalid start byte error

查看:189
本文介绍了使用pd.read_csv导入csv-无效的起始字节错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下命令导入csv文件:

data = pd.read_csv("filename.csv")

我收到以下错误:"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 2: invalid start byte".

此问题的答案: UnicodeDecodeError:'utf8'编解码器无法解码字节0x9c 也许可以,但是我不确定如何实现(因为我还没有足够的声誉,所以我无法对答案发表评论.)

任何帮助将不胜感激.

问题似乎与我有学位标志的事实有关.如果在导入过程中跳过此问题,对我来说很好.

解决方案

如果由于文件编码不是pd.read_csv()文档中提到的默认编码而遇到编码错误,则可以找到文件的编码首先安装 chardet ,然后安装以下代码:

import chardet    
rawdata = open('D:\\path\\file.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)

这将为您提供文件的编码.

一旦有了编码,就可以读为:

pd.read_csv('D:\\path\\file.csv',encoding = 'encoding you found')

pd.read_csv(r'D:\path\file.csv',encoding = 'encoding you found')

您将在此处

希望您觉得这很有用.

I'm trying to import a csv file using:

data = pd.read_csv("filename.csv")

I get the following error: "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 2: invalid start byte".

The answer in this question: UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c might work, but I am not sure how to implement it (I can't comment on the answer because I don't have enough reputation yet).

Any help would be appreciated.

Edit: The issue seems to be linked to the fact that I have a degree symbol. It would be fine for me if during import this issue is just skipped.

解决方案

If you face an encoding error due to encoding on your file not being the default as mentioned by the pd.read_csv() docs , you can find the encoding of the file by first installing chardet followed by the below code:

import chardet    
rawdata = open('D:\\path\\file.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)

This will give you the encoding of the file.

Once you have the encoding, you can read as :

pd.read_csv('D:\\path\\file.csv',encoding = 'encoding you found')

or

pd.read_csv(r'D:\path\file.csv',encoding = 'encoding you found')

You will get the list of all encoding here

Hope you find this useful.

这篇关于使用pd.read_csv导入csv-无效的起始字节错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆