如何更新MultiIndexed Pandas DataFrame的子集 [英] How to update a subset of a MultiIndexed pandas DataFrame
问题描述
我使用的是MultiIndexed pandas DataFrame,并且想将DataFrame的子集乘以一定数量.
I'm using a MultiIndexed pandas DataFrame and would like to multiply a subset of the DataFrame by a certain number.
与此相同,但具有一个MultiIndex.
It's the same as this but with a MultiIndex.
>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009],
'flavour':['strawberry','strawberry','banana','banana',
'strawberry','strawberry','banana','banana'],
'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
'sales':[10,12,22,23,11,13,23,24]})
>>> d = d.set_index(['year','flavour','day'])
>>> d
sales
year flavour day
2008 strawberry sat 10
sun 12
banana sat 22
sun 23
2009 strawberry sat 11
sun 13
banana sat 23
sun 24
到目前为止,太好了.但是,可以说我发现周六的所有数字仅为应有的一半!我想将所有sat
销售额乘以2.
So far, so good. But let's say I spot that all the Saturday figures are only half what they should be! I'd like to multiply all sat
sales by 2.
我对此的第一次尝试是:
My first attempt at this was:
sat = d.xs('sat', level='day')
sat = sat * 2
d.update(sat)
但这不起作用,因为变量sat
失去了索引的day
级别:
but this doesn't work because the variable sat
has lost the day
level of the index:
>>> sat
sales
year flavour
2008 strawberry 20
banana 44
2009 strawberry 22
banana 46
所以熊猫不知道如何将新的销售数据重新加入旧的数据框.
so pandas doesn't know how to join the new sales figures back onto the old dataframe.
我很快就刺了一下:
>>> sat = d.xs('sat', level='day', copy=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2248, in xs
raise ValueError('Cannot retrieve view (copy=False)')
ValueError: Cannot retrieve view (copy=False)
我不知道该错误意味着什么,但是我觉得自己是在用积雪来爬山.有人知道这样做的正确方法吗?
I have no idea what that error means, but I feel like I'm making a mountain out of a molehill. Does anyone know the right way to do this?
预先感谢, 罗布
推荐答案
注意:即将发布0.13 a
Note: In soon to be released 0.13 a drop_level
argument has been added to xs (thanks to this question!):
In [42]: df.xs('sat', level='day', drop_level=False)
Out[42]:
sales
year flavour day
2008 strawberry sat 10
另一个选择是使用select(提取相同数据的sub-DataFrame(副本),即它具有相同的索引,因此可以正确更新):
Another option is to use select (which extracts a sub-DataFrame (copy) of the same data, i.e. it has the same index and so can be updated correctly):
In [11]: d.select(lambda x: x[2] == 'sat') * 2
Out[11]:
sales
year flavour day
2008 strawberry sat 20
banana sat 44
2009 strawberry sat 22
banana sat 46
In [12]: d.update(d.select(lambda x: x[2] == 'sat') * 2)
另一个选项是使用套用:
In [21]: d.apply(lambda x: x*2 if x.name[2] == 'sat' else x, axis=1)
另一个选项是使用get_level_values
(这可能是其中最有效的方式):
In [22]: d[d.index.get_level_values('day') == 'sat'] *= 2
另一个选项是将天"级别提升为一列,然后使用应用.
Another option is promote the 'day' level to a column and then use an apply.
这篇关于如何更新MultiIndexed Pandas DataFrame的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!