AssertionError:unstack()数据帧时,blk ref_locs中存在间隙 [英] AssertionError: Gaps in blk ref_locs when unstack() dataframe
问题描述
我正在尝试将Pandas数据框中的数据进行stack(),但是我一直收到此错误,并且我不确定为什么.到目前为止,这是我的代码,并提供了我的数据样本.我试图解决此问题的方法是删除voteId不是数字的所有行,这不适用于我的实际数据集.当我部署代码时,这在Anaconda笔记本(我正在开发的笔记本)和生产环境中都会发生.
I am trying to unstack() data in a Pandas dataframe, but I keep getting this error, and I'm not sure why. Here is my code so far with a sample of my data. My attempt to fix it was to remove all rows where voteId was not a number, which did not work with my actual dataset. This is happening both in an Anaconda notebook (where I am developing) and in my production env when I deploy the code.
我无法弄清楚如何在示例代码中重现错误……可能是由于实例化数据帧时不存在类型转换问题,就像我在示例中所做的那样?
I could not figure out how to reproduce the error in my sample code... possibly due to a typecasting issue that doesnt exist when you instantiate the dataframe like I did in the sample?
#dataset simulate likely input
# d = {'vote': [100, 50,1,23,55,67,89,44],
# 'vote2': [10, 2,18,26,77,99,9,40],
# 'ballot1': ['a','b','a','a','b','a','c','c'],
# 'voteId':[1,2,3,4,5,'aaa',7,'NaN']}
# df1=pd.DataFrame(d)
#########################################################
df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')
s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)
dflw=pd.DataFrame(s)
完整的错误消息/堆栈跟踪:
Full error message/stack trace:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-10-0a520180a8d9> in <module>()
22 df1=df1.drop_duplicates(['voteId','ballot1'],keep='last')
23
---> 24 s=df1[:10].set_index(['voteId','ballot1'],verify_integrity=True).unstack()
25 s.columns=s.columns.map('(ballot1={0[1]}){0[0]}'.format)
26 dflw=pd.DataFrame(s)
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in unstack(self, level, fill_value)
4567 """
4568 from pandas.core.reshape.reshape import unstack
-> 4569 return unstack(self, level, fill_value)
4570
4571 _shared_docs['melt'] = ("""
~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in unstack(obj, level, fill_value)
467 if isinstance(obj, DataFrame):
468 if isinstance(obj.index, MultiIndex):
--> 469 return _unstack_frame(obj, level, fill_value=fill_value)
470 else:
471 return obj.T.stack(dropna=False)
~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in _unstack_frame(obj, level, fill_value)
480 unstacker = partial(_Unstacker, index=obj.index,
481 level=level, fill_value=fill_value)
--> 482 blocks = obj._data.unstack(unstacker)
483 klass = type(obj)
484 return klass(blocks)
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in unstack(self, unstacker_func)
4349 new_columns = new_columns[columns_mask]
4350
-> 4351 bm = BlockManager(new_blocks, [new_columns, new_index])
4352 return bm
4353
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
3035 self._consolidate_check()
3036
-> 3037 self._rebuild_blknos_and_blklocs()
3038
3039 def make_empty(self, axes=None):
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in _rebuild_blknos_and_blklocs(self)
3127
3128 if (new_blknos == -1).any():
-> 3129 raise AssertionError("Gaps in blk ref_locs")
3130
3131 self._blknos = new_blknos
AssertionError: Gaps in blk ref_locs
推荐答案
要获取触发异常的真实数据,请添加其他调试信息
To get the real data triggered the exception, add extra debug information
修改
~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py
将行添加到class BlockManager()
def __init__(self)
print('BlockManager blocks')
pprint(self.blocks)
print('BlockManager axes')
pprint(self.axes)
您将获得数据:
_unstack_frame level -1 fill_value None
vote vote2
ballot1 voteId
NaN xx 100.0 10.0
False aaa 50.1 2.0
-1 \n 1.0 18.0
True NaN 23.0 26.0
b False 55.0 77.0
a \ 67.0 99.0
c 89.0 9.0
8 44.0 NaN
修改
~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py
def __unstack_frame(self, ...)
from pprint import pprint
print('_unstack_frame level {} fill_value {} {}'.format(level, fill_value, type(obj)))
pprint(obj)
您将看到数据:
BlockManager blocks
(FloatBlock: slice(0, 16, 1), 16 x 8, dtype: float64,)
BlockManager axes
[MultiIndex(levels=[[u'vote', u'vote2'], [False, 8, u'\n', u' ', u'\', u'aaa', u'xx']],
labels=[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], [-1, 0, 1, 2, 3, 4, 5, 6, -1, 0, 1, 2, 3, 4, 5, 6]],
names=[None, u'voteId']),
Index([nan, -1, False, True, u'', u'a', u'b', u'c'], dtype='object', name=u'ballot1')]
我确实触发了另一个例子的异常:
I did trigger an exception with another example:
File "/usr/lib64/python2.7/site-packages/pandas/core/internals.py", line 2902, in _rebuild_blknos_and_blklocs
raise AssertionError("Gaps in blk ref_locs")
AssertionError: Gaps in blk ref_locs
带有调试信息
BlockManager blocks
(FloatBlock: [-1, -1, -1], 3 x 2, dtype: float64,)
BlockManager axes
[Index([aaa, bbb, ccc], dtype='object'), Int64Index([0, 1], dtype='int64')]
这篇关于AssertionError:unstack()数据帧时,blk ref_locs中存在间隙的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!