pandas 读取json在MultiIndex上不起作用 [英] pandas read json not working on MultiIndex
问题描述
我正在尝试通过pd.read_json
读取通过df.to_json()
创建的数据帧,但是却得到了ValueError
.我认为这可能与索引是MultiIndex的事实有关,但是我不确定如何处理.
I'm trying to read in a dataframe created via df.to_json()
via pd.read_json
but I'm getting a ValueError
. I think it may have to do with the fact that the index is a MultiIndex but I'm not sure how to deal with that.
55k行的原始数据帧称为psi
,我通过以下方式创建了test.json
:
The original dataframe of 55k rows is called psi
and I created test.json
via:
psi.head().to_json('test.json')
此处是print psi.head().to_string()
的输出,如果您想用那个.
Hereis the output of print psi.head().to_string()
if you want to use that.
当我对这小部分数据(5行)进行处理时,我得到一个ValueError
.
When I do it on this small set of data (5 rows), I get a ValueError
.
! wget --no-check-certificate https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json
import pandas as pd
pd.read_json('test.json')
这是完整的堆栈:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-1de2f0e65268> in <module>()
1 get_ipython().system(u' wget https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json'>)
2 import pandas as pd
----> 3 pd.read_json('test.json')
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
196 obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
197 keep_default_dates, numpy, precise_float,
--> 198 date_unit).parse()
199
200 if typ == 'series' or obj is None:
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
264
265 else:
--> 266 self._parse_no_numpy()
267
268 if self.obj is None:
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
481 if orient == "columns":
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
485 decoded = dict((str(k), v)
ValueError: No ':' found when decoding object value
> /home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.py(483)_parse_no_numpy()
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
但是当我在整个数据帧(55k行)上执行此操作时,我得到一个无效的指针错误,IPython内核死亡.有什么想法吗?
But when I do it on the whole dataframe (55k rows) then I get an invalid pointer error and the IPython kernel dies. Any ideas?
首先添加了json的生成方式.
added how the json was generated in the first place.
推荐答案
这未实现ATM,请在此处查看问题: https://github.com/pydata/pandas/issues/4889 .
This is not implemented ATM, see the issue here: https://github.com/pydata/pandas/issues/4889.
您可以简单地首先重置索引,例如
You can simply reset the index first, e.g
df.reset_index().to_json(...)
它将起作用.
这篇关于 pandas 读取json在MultiIndex上不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!