将utf-8 CSV文件读入数据框 [英] read utf-8 CSV file into dataframe

查看:77
本文介绍了将utf-8 CSV文件读入数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在试图弄清楚如何获取我下载到DataFrame中的UTF-8 CSV.到目前为止,我已经尝试过

I have been trying to figure out how to get a UTF-8 CSV that I downloaded into a DataFrame. So far I have tried

df = pd.read_csv('myfile.csv', encoding='utf8')

它给了我垃圾.我已经成功地使用

and it gives me garbage. I am having success reading it in with

import csv
with open('some.csv', newline='', encoding='utf-8') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

如这篇文章中所建议的

使用Python读取UTF8 CSV文件

但是它读取了这个巨大的文件,而我无法将其放入DataFrame中.

but it reads in this gigantic file and I cannot get it into a DataFrame.

我正在使用python3.感谢您的帮助!

I'm using python 3. Thanks for helping!

我的具体错误输出是

UnicodeDecodeError:'utf-8'编解码器无法解码位置3的字节0xa0:无效的起始字节'

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 3: invalid start byte'

我要使用的文件是从此链接下载的CSV年度文件之一(不是每周,我不确定每周的格式是否不同)

And the file I am trying to work is one of the YEARLY CSV files downloaded from this link (not WEEKLY, I am not sure if weekly is a different format)

https://exporter.nih.gov/ExPORTER_Catalog.aspx ?sid = 2& index = 0

推荐答案

由于此问题的帖子,我对其进行了修复

I fixed it thanks to the post at this question

'utf- 8'编解码器无法解码位置18的字节0x92:无效的起始字节

我想我会尝试他们建议的解决方法

I thought I would try the fix that they suggested

df = pd.read_csv('myfile.csv', encoding='cp1252')

成功了!这是Windows代码页1252 ...不是utf-8

and it worked! It's Windows codepage 1252... not utf-8

这篇关于将utf-8 CSV文件读入数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆