StringIO和read_csv pandas 编码错误 [英] Encoding errors with StringIO and read_csv pandas

查看:242
本文介绍了StringIO和read_csv pandas 编码错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用API​​来获取一些数据.返回的数据使用Unicode(不是字典/json对象).

I am using an API to get some data. The data returned is in Unicode (not a dictionary / json object).

data = []
for urls in api_call_list:
    data.append(requests.get(urls))
the data looks like this:

>>> data[0].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Taylor Swift;33100;0.83;0.20\r\n'

>>> data[1].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Rihanna;28100;0.76;0.33\r\n'

我使用以下代码将其转换为数据框:

I use this code to convert this to a dataframe:

from io import StringIO     
import pandas as pd

pd.concat([pd.read_csv(StringIO(d.text), sep = ";") for d in data])

工作正常,除非结果中包含非英语字符,特别是韩文,中文或日文.它们完全乱码了.我尝试将utf_8,cp1252和iso-8859-1作为值将编码参数添加到read_csv.这些都不起作用.

Works just fine except when there are non-english characters involved in the results, specially, Korean, Chinese or Japanese. It completely garbles them. I tried adding the encoding argument to read_csv with utf_8, cp1252 and iso-8859-1 as values. None of these worked.

我应该如何正确读取这些数据?

How should i read this data correctly?

推荐答案

经过分析和研究,我发现了问题所在. API返回的unicode已解码或编码不正确,但是可以设置.因此,我所做的是添加了一行以设置请求中有效负载的编码.

After some analysis and research , I was able to identify the problem. The unicode returned by the API was decoded or did not have the correct encoding but it can be set. So what i did is added a line to set the encoding for the payload from requests.

data = []
for urls in api_call_list:
    r = requests.get(urls)
    r.encoding = 'utf-8'
    data.append(r)

,然后将编码参数添加到read_csv:

and then added encoding argument to read_csv :

pd.concat([pd.read_csv(StringIO(d.text), sep = ";", encoding='utf-8') for d in data])

设置正确.该文档位于: http://docs.python-requests.org/zh/master/user/quickstart/

that set it right. the documentation is here: http://docs.python-requests.org/en/master/user/quickstart/

这篇关于StringIO和read_csv pandas 编码错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆