在 pandas 中使用multiindex设置值 [英] Setting values with multiindex in pandas
问题描述
There are already a couple of questions on SO relating to this, most notably this one, however none of the answers seem to work for me and quite a few links to docs (especially on lexsorting) are broken, so I'll ask another one.
我正在尝试做(看似)非常简单的事情.考虑以下MultiIndexed Dataframe:
I'm trying do to something (seemingly) very simple. Consider the following MultiIndexed Dataframe:
import pandas as pd; import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)
现在,我想将类别one
中的观察值设置在列0
中的所有值设置为某个值(例如np.NaN
).我失败了:
Now I want to set all values in column 0
to some value (say np.NaN
) for the observations in category one
. I've failed with:
df.loc(axis=0)[:, "one"][0] = 1 # setting with copy warning
和
df.loc(axis=0)[:, "one", 0] = 1
可能会发出有关键的长度超过索引长度的警告,或者发出有关缺少足够深度的词法排序的警告.
which either yields a warning about length of keys exceeding length of index, or one about a lack of lexsorting to sufficient depth.
正确的方法是什么?
推荐答案
I think you can use loc
with tuple for selecting MultiIndex
and 0
for selecting column:
import pandas as pd;
import random
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
#add for testing
np.random.seed(0)
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.concat([pd.Series(np.random.randn(8), index=index), pd.Series(np.random.randn(8), index=index)], axis=1)
print df
0 1
first second
bar one 1.764052 -0.103219
two 0.400157 0.410599
baz one 0.978738 0.144044
two 2.240893 1.454274
foo one 1.867558 0.761038
two -0.977278 0.121675
qux one 0.950088 0.443863
two -0.151357 0.333674
df.loc[('bar', "one"), 0] = 1
print df
0 1
first second
bar one 1.000000 -0.103219
two 0.400157 0.410599
baz one 0.978738 0.144044
two 2.240893 1.454274
foo one 1.867558 0.761038
two -0.977278 0.121675
qux one 0.950088 0.443863
two -0.151357 0.333674
如果需要将级别second
中的所有行设置为值one
,请使用slice(None)
:
If you need set all rows in level second
with value one
use slice(None)
:
df.loc[(slice(None), "one"), 0] = 1
print df
0 1
first second
bar one 1.000000 -0.103219
two 0.400157 0.410599
baz one 1.000000 0.144044
two 2.240893 1.454274
foo one 1.000000 0.761038
two -0.977278 0.121675
qux one 1.000000 0.443863
two -0.151357 0.333674
文档.
这篇关于在 pandas 中使用multiindex设置值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!