将 ZipFile 从 URL 读入 StringIO 并用 panda.read_csv 解析 [英] Read ZipFile from URL into StringIO and parse with panda.read_csv
问题描述
我正在尝试从 URL 读取 ZipFile
数据,并通过 StringIO
使用 将
ZipFile
中的数据解析为 csvpandas.read_csv
I'm trying to read ZipFile
data from a URL and via StringIO
parse the data inside the ZipFile
as csv using pandas.read_csv
r = req.get("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").content
file = ZipFile(StringIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
最后一行给了我一个错误:
The last line gave me an error:
CParserError: Error tokenizing data. C error: Calling read(nbytes) on source failed. Try engine='python'.
但是,如果我尝试使用
However if i try using
salaries = pd.read_csv(file.open("Salaries.csv"))
它有效.
所以我想知道我在这里错过了什么.
So I was wondering what am I missing out here.
file.open
应该返回一个 ZipExtFile
对象,并且由于 read_csv 只接受字符串或文件句柄/StringIO
输入,为什么最后一行是然后工作?
file.open
should return a ZipExtFile
object and since read_csv takes only string or file handle / StringIO
input, why is the last line working then?
推荐答案
我认为您读取数据的方式有问题,它对我使用 urllib2 有效.
I think something is wrong with the way you read the data, it works for me using urllib2.
from zipfile import ZipFile
from StringIO import StringIO
import urllib2
r = urllib2.urlopen("http://seanlahman.com/files/database/lahman-csv_2014-02-14.zip").read()
file = ZipFile(StringIO(r))
salaries_csv = file.open("Salaries.csv")
salaries = pd.read_csv(salaries_csv)
yearID teamID lgID playerID salary
0 1985 BAL AL murraed02 1472819
1 1985 BAL AL lynnfr01 1090000
2 1985 BAL AL ripkeca01 800000
3 1985 BAL AL lacyle01 725000
4 1985 BAL AL flanami01 641667
5 1985 BAL AL boddimi01 625000
6 1985 BAL AL stewasa01 581250
7 1985 BAL AL martide01 560000
这篇关于将 ZipFile 从 URL 读入 StringIO 并用 panda.read_csv 解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!