使用Pandas从网上获取数据的两个错误(IncompleteRead和urlopen错误) [英] Two errors fetching data from the web with Pandas (IncompleteRead & urlopen error)

查看:45
本文介绍了使用Pandas从网上获取数据的两个错误(IncompleteRead和urlopen错误)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用Jupyter Notebook中的Pandas从Web上获取数据(一个csv文件):

I have tried to fetch data from the web (a csv file) using Pandas in Jupyter Notebook:

import pandas as pd
df1 = pd.read_csv("https://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv")

我第一次遇到以下错误:

The first time I get the following error:

IncompleteRead:IncompleteRead(读取5738795字节,预计还会增加2437944)

IncompleteRead: IncompleteRead(5738795 bytes read, 2437944 more expected)

我在jupyter笔记本中的另一个单元中再次尝试,并得到另一个错误:

I try it again in a different cell in jupyter notebook and get another error:

URLError:

我第三次尝试,Jupyter Notebook保持挂了好久了

I try a third time and Jupyter Notebook keeps hanging for ages

您知道这两个错误是什么意思(熊猫试图告诉我什么,发生了什么),以及如何解决它们?

Any idea what these two errors means (what is pandas trying to tell me, what happened), and how to fix them?

推荐答案

如果使用curl来下载文件,或者使用显示文本的Web浏览器将其击中,则会发现该文件不是UTF-8编码,这就是Pandas所假定的.我无法告诉您该数据集的编码是什么,但是您可以作弊并使用ISO-8859-1至少将其加载并模拟 1个字节= = 1个字符 ,直到您可以理解编码应该是什么.

If you use curl to download the file, or hit it with a web browser that shows the text, you'll see that the file is not UTF-8 encoded, which is what Pandas assumes it is. I cannot tell you what the encoding should be for this dataset, but you can cheat and use ISO-8859-1 to at least get it loaded and simulate the naive (and totally untrue) assumption that 1 byte == 1 char until you can get a handle on what the encoding should be.

import pandas as pd
url = "https://www.crowdflower.com/wp-content/uploads/2016/03/gender-classifier-DFE-791531.csv"
df1 = pd.read_csv(url, encoding="iso-8859-1")
print(df1)

然后,继续阅读.这是一个老歌,但一个好东西:

Then, read up on this. It's an oldie, but a goodie: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) . Like he says, "No excuses!"

这篇关于使用Pandas从网上获取数据的两个错误(IncompleteRead和urlopen错误)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆