读取压缩文件作为Pandas DataFrame [英] Read a zipped file as a pandas DataFrame
问题描述
我正在尝试解压缩一个csv文件并将其传递给熊猫,以便我可以处理该文件.
到目前为止,我尝试过的代码是:
I'm trying to unzip a csv file and pass it into pandas so I can work on the file.
The code I have tried so far is:
import requests, zipfile, StringIO
r = requests.get('http://data.octo.dc.gov/feeds/crime_incidents/archive/crime_incidents_2013_CSV.zip')
z = zipfile.ZipFile(StringIO.StringIO(r.content))
crime2013 = pandas.read_csv(z.read('crime_incidents_2013_CSV.csv'))
在最后一行之后,尽管python能够获取文件,但在错误结尾处出现不存在".
有人可以告诉我我做错了什么吗?
Can someone tell me what I'm doing incorrectly?
推荐答案
如果要将压缩文件或tar.gz文件读入pandas数据帧,则read_csv
方法包括此特定实现.
If you want to read a zipped or a tar.gz file into pandas dataframe, the read_csv
methods includes this particular implementation.
df = pd.read_csv('filename.zip')
或长格式:
df = pd.read_csv('filename.zip', compression='zip', header=0, sep=',', quotechar='"')
docs 中的压缩参数说明:
压缩:{'推断','gzip','bz2','zip','xz',无},默认为'推断' 用于对磁盘数据进行即时解压缩.如果推断"和filepath_or_buffer类似于路径,请从以下扩展名检测压缩:.gz",.bz2",.zip"或".xz"(否则不进行解压缩).如果使用"zip",则ZIP文件必须仅包含一个要读取的数据文件.设置为无"将不进行解压缩.
compression : {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default ‘infer’ For on-the-fly decompression of on-disk data. If ‘infer’ and filepath_or_buffer is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, or ‘.xz’ (otherwise no decompression). If using ‘zip’, the ZIP file must contain only one data file to be read in. Set to None for no decompression.
0.18.1版中的新功能:支持"zip"和"xz"压缩.
New in version 0.18.1: support for ‘zip’ and ‘xz’ compression.
这篇关于读取压缩文件作为Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!