来自字符串或包数据的pandas.read_csv [英] pandas.read_csv from string or package data

查看:144
本文介绍了来自字符串或包数据的pandas.read_csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的包装中有一些csv文本数据,我想使用read_csv读取这些数据.我是通过

I have some csv text data in a package which I want to read using read_csv. I was doing this by

from pkgutil import get_data
from StringIO import StringIO

data = read_csv(StringIO(get_data('package.subpackage', 'path/to/data.csv')))

但是,StringIO.StringIO在Python 3中消失了,并且io.StringIO仅接受Unicode.有没有简单的方法可以做到这一点?

However, StringIO.StringIO disappears in Python 3, and io.StringIO only accepts Unicode. Is there a simple way to do this?

修改:以下内容似乎无效

import pandas as pd

import pkgutil
from io import StringIO

def get_data_file(pkg, path):
    f = StringIO()
    contents = unicode(pkgutil.get_data('pymc.examples', 'data/wells.dat'))
    f.write(contents)
    return f

wells = get_data_file('pymc.examples', 'data/wells.dat')

data = pd.read_csv(wells, delimiter=' ', index_col='id',
                   dtype={'switch': np.int8})

失败

  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 401, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 209, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 509, in __init__
    self._make_engine(self.engine)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 611, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 893, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "parser.pyx", line 441, in pandas._parser.TextReader.__cinit__ (pandas/src/parser.c:3940)
  File "parser.pyx", line 551, in pandas._parser.TextReader._get_header (pandas/src/parser.c:5096)
pandas._parser.CParserError: Passed header=0 but only 0 lines in file

推荐答案

以下内容在3.3中对我有用:

The following worked for me in 3.3:

>>> import numpy as np, pandas as pd
>>> import io, pkgutil
>>> wells = pkgutil.get_data('pymc.examples', 'data/wells.dat')
>>> type(wells)
<class 'bytes'>
>>> df = pd.read_csv(io.BytesIO(wells), encoding='utf8', sep=" ", index_col="id", dtype={"switch": np.int8})
>>> df.head()
    switch  arsenic       dist  assoc  educ
id                                         
1        1     2.36  16.826000      0     0
2        1     0.71  47.321999      0     0
3        0     2.07  20.966999      0    10
4        1     1.15  21.486000      0    12
5        1     1.10  40.874001      1    14

[5 rows x 5 columns]

我必须手动将wells.dat放在该位置,所以我不能保证我已正确复制了它,并且没有终端空格,因为我删除了一些空格.但是将read_csv对象和BytesIO对象传递给编码参数应该可以. (实际上,没有它,您可能会逃脱,但这是一个好习惯.io.TextIOWrapper可能是另一种选择.)

N.B. I had to manually put wells.dat in that location, so I can't swear I copied it correctly and that there isn't terminal whitespace, because I deleted some. But passing read_csv a BytesIO object and an encoding parameter should work. (Actually, you can probably get away without it, but it's a good habit. io.TextIOWrapper might be another option.)

这篇关于来自字符串或包数据的pandas.read_csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆