在 pandas 中使用multiindex设置值 [英] Setting values with multiindex in pandas

查看:93
本文介绍了在 pandas 中使用multiindex设置值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于SO的问题已经很多,尤其是

There are already a couple of questions on SO relating to this, most notably this one, however none of the answers seem to work for me and quite a few links to docs (especially on lexsorting) are broken, so I'll ask another one.

我正在尝试做(看似)非常简单的事情.考虑以下MultiIndexed Dataframe:

I'm trying do to something (seemingly) very simple. Consider the following MultiIndexed Dataframe:

import pandas as pd; import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

现在,我想将类别one中的观察值设置在列0中的所有值设置为某个值(例如np.NaN).我失败了:

Now I want to set all values in column 0 to some value (say np.NaN) for the observations in category one. I've failed with:

df.loc(axis=0)[:, "one"][0] = 1 # setting with copy warning

df.loc(axis=0)[:, "one", 0] = 1

可能会发出有关键的长度超过索引长度的警告,或者发出有关缺少足够深度的词法排序的警告.

which either yields a warning about length of keys exceeding length of index, or one about a lack of lexsorting to sufficient depth.

正确的方法是什么?

推荐答案

我认为您可以使用

I think you can use loc with tuple for selecting MultiIndex and 0 for selecting column:

import pandas as pd; 
import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
      ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

#add for testing
np.random.seed(0)
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)

print df
                     0         1
first second                    
bar   one     1.764052 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

df.loc[('bar', "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     0.978738  0.144044
      two     2.240893  1.454274
foo   one     1.867558  0.761038
      two    -0.977278  0.121675
qux   one     0.950088  0.443863
      two    -0.151357  0.333674

如果需要将级别second中的所有行设置为值one,请使用slice(None):

If you need set all rows in level second with value one use slice(None):

df.loc[(slice(None), "one"), 0] = 1
print df
                     0         1
first second                    
bar   one     1.000000 -0.103219
      two     0.400157  0.410599
baz   one     1.000000  0.144044
      two     2.240893  1.454274
foo   one     1.000000  0.761038
      two    -0.977278  0.121675
qux   one     1.000000  0.443863
      two    -0.151357  0.333674

文档.

这篇关于在 pandas 中使用multiindex设置值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆