在zip压缩中使用pandas read_csv [英] Using pandas read_csv with zip compression

查看:178
本文介绍了在zip压缩中使用pandas read_csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在熊猫中使用read_csv从FTP服务器读取压缩文件.压缩文件仅包含一个文件,

I'm trying to use read_csv in pandas to read a zipped file from an FTP server. The zip file contains just one file, as is required.

这是我的代码:

pd.read_csv('ftp://ftp.fec.gov/FEC/2016/cn16.zip', compression='zip')

我收到此错误:

AttributeError: addinfourl instance has no attribute 'seek'

我在熊猫18.1和19.0中都遇到此错误.我是否缺少某些东西,或者这可能是个错误?

I get this error in both pandas 18.1 and 19.0. Am I missing something, or could this be a bug?

推荐答案

虽然我不太确定为什么会收到错误,但是可以通过使用urllib2打开url并将数据写入in中来解决. -内存二进制流,如此处所示.此外,我们必须指定正确的分隔符,否则我们将收到另一个错误.

Although I'm not completely sure why you get the error, you can get around it by opening the url using urllib2 and writing the data to an in-memory binary stream, as shown here. In addition, we have to specify the correct separator, or else we would receive another error.

import io
import urllib2 as urllib
import pandas as pd

r = urllib.urlopen('ftp://ftp.fec.gov/FEC/2016/cn16.zip')
df = pd.read_csv(io.BytesIO(r.read()), compression='zip', sep='|', header=None)

至于错误本身,我认为熊猫正在尝试在下载url内容之前使用"zip文件"上的seek(因此它实际上不是zip文件),这将导致该错误.

As far as the error itself, I think pandas is trying to use seek on the "zip file" prior to downloading the url contents (so it's not really a zip file), which would result in that error.

这篇关于在zip压缩中使用pandas read_csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆