如何更新MultiIndexed Pandas DataFrame的子集 [英] How to update a subset of a MultiIndexed pandas DataFrame

查看:66
本文介绍了如何更新MultiIndexed Pandas DataFrame的子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是MultiIndexed pandas DataFrame,并且想将DataFrame的子集乘以一定数量.

I'm using a MultiIndexed pandas DataFrame and would like to multiply a subset of the DataFrame by a certain number.

相同,但具有一个MultiIndex.

It's the same as this but with a MultiIndex.

>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009], 
                      'flavour':['strawberry','strawberry','banana','banana',
                      'strawberry','strawberry','banana','banana'],
                      'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
                      'sales':[10,12,22,23,11,13,23,24]})

>>> d = d.set_index(['year','flavour','day'])                  

>>> d
                     sales
year flavour    day       
2008 strawberry sat     10
                sun     12
     banana     sat     22
                sun     23
2009 strawberry sat     11
                sun     13
     banana     sat     23
                sun     24

到目前为止,太好了.但是,可以说我发现周六的所有数字仅为应有的一半!我想将所有sat销售额乘以2.

So far, so good. But let's say I spot that all the Saturday figures are only half what they should be! I'd like to multiply all sat sales by 2.

我对此的第一次尝试是:

My first attempt at this was:

sat = d.xs('sat', level='day')
sat = sat * 2
d.update(sat)

但这不起作用,因为变量sat失去了索引的day级别:

but this doesn't work because the variable sat has lost the day level of the index:

>>> sat
                 sales
year flavour          
2008 strawberry     20
     banana         44
2009 strawberry     22
     banana         46

所以熊猫不知道如何将新的销售数据重新加入旧的数据框.

so pandas doesn't know how to join the new sales figures back onto the old dataframe.

我很快就刺了一下:

>>> sat = d.xs('sat', level='day', copy=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2248, in xs
    raise ValueError('Cannot retrieve view (copy=False)')
ValueError: Cannot retrieve view (copy=False)

我不知道该错误意味着什么,但是我觉得自己是在用积雪来爬山.有人知道这样做的正确方法吗?

I have no idea what that error means, but I feel like I'm making a mountain out of a molehill. Does anyone know the right way to do this?

预先感谢, 罗布

推荐答案

注意:即将发布0.13 a 参数已添加到xs (感谢此问题!):

Note: In soon to be released 0.13 a drop_level argument has been added to xs (thanks to this question!):

In [42]: df.xs('sat', level='day', drop_level=False)
Out[42]:
                     sales
year flavour    day
2008 strawberry sat     10

另一个选择是使用select(提取相同数据的sub-DataFrame(副本),即它具有相同的索引,因此可以正确更新):

Another option is to use select (which extracts a sub-DataFrame (copy) of the same data, i.e. it has the same index and so can be updated correctly):

In [11]: d.select(lambda x: x[2] == 'sat') * 2
Out[11]:
                     sales
year flavour    day
2008 strawberry sat     20
     banana     sat     44
2009 strawberry sat     22
     banana     sat     46

In [12]: d.update(d.select(lambda x: x[2] == 'sat') * 2)

另一个选项是使用套用:

In [21]: d.apply(lambda x: x*2 if x.name[2] == 'sat' else x, axis=1)

另一个选项是使用get_level_values (这可能是其中最有效的方式):

In [22]: d[d.index.get_level_values('day') == 'sat'] *= 2

另一个选项是将天"级别提升为一列,然后使用应用.

Another option is promote the 'day' level to a column and then use an apply.

这篇关于如何更新MultiIndexed Pandas DataFrame的子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆