切片和分配值的唯一顺序索引的多索引 pandas 数据框 [英] Slicing and assigning values multi-indexed pandas dataframe of unique sequential indices
问题描述
我想选择并更改数据框单元格的值.此数据帧有2个索引:"datetime"和"idx".两者都包含唯一且顺序的标签. "datetime"索引具有日期时间类型的日期时间标签,"idx"具有整数值标签.
I want to select and change the value of a dataframe cell. There are 2 indices used for this dataframe: 'datetime' and 'idx'. Both contain labels which are unique and sequential. 'datetime' index has datetime label of datetime type, and 'idx' has integer valued labels.
import numpy as np
import pandas as pd
dt = pd.date_range("2010-10-01 00:00:00", periods=5, freq='H')
d = {'datetime': dt, 'a': np.arange(len(dt))-1,'b':np.arange(len(dt))+1}
df = pd.DataFrame(data=d)
df.set_index(keys='datetime',inplace=True,drop=True)
df.sort_index(axis=0,level='datetime',ascending=False,inplace=True)
df.loc[:,'idx'] = np.arange(0, len(df),1)+5
df.set_index('idx',drop=True,inplace=True,append=True)
print(df)
这里是数据框:
a b
datetime idx
2010-10-01 04:00:00 5 3 5
2010-10-01 03:00:00 6 2 4
2010-10-01 02:00:00 7 1 3
2010-10-01 01:00:00 8 0 2
2010-10-01 00:00:00 9 -1 1
'说我想获取idx = 5的行.我怎么做?我可以使用这个:
'Say I want to get the row where idx=5. How do I do that? I could use this:
print(df.iloc[0])
然后我将在下面获得结果:
Then I will get result below:
a 3
b 5
Name: (2010-10-01 04:00:00, 5), dtype: int32
但是我要在idx = 5,column ='a',通过指定idx值和列名'a'的该单元格中访问并设置值 >.我该怎么办?
But I want to access and set the value in this cell where idx=5, column='a', by specifying idx value, and column name 'a'. How do I do that?
请咨询.
推荐答案
您可以使用或者您可以使用 DataFrame.eval() 方法,如果您需要设置/更新一些单元格:
Or you can use DataFrame.eval() method if you need to set/update some cells:
In [61]: df.loc[df.eval('idx==5'), 'a'] = 100
In [62]: df
Out[62]:
a b
datetime idx
2010-10-01 04:00:00 5 100 5
2010-10-01 03:00:00 6 2 4
2010-10-01 02:00:00 7 1 3
2010-10-01 01:00:00 8 0 2
2010-10-01 00:00:00 9 -1 1
说明:
In [59]: df.eval('idx==5')
Out[59]:
datetime idx
2010-10-01 04:00:00 5 True
2010-10-01 03:00:00 6 False
2010-10-01 02:00:00 7 False
2010-10-01 01:00:00 8 False
2010-10-01 00:00:00 9 False
dtype: bool
In [60]: df.loc[df.eval('idx==5')]
Out[60]:
a b
datetime idx
2010-10-01 04:00:00 5 3 5
PS if your original MultiIndex doesn't have names, you can easily set them using rename_axis() method:
df.rename_axis(('datetime','idx')).query(...)
替代(价格稍贵)的解决方案-使用sort_index()
+ pd.IndexSlice[]
:
Alternative (bit more expensive) solution - using sort_index()
+ pd.IndexSlice[]
:
In [106]: df.loc[pd.IndexSlice[:,5], ['a']]
...
skipped
...
KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (2), lexsort depth (0)'
所以我们需要先对索引进行排序:
so we would need to sort index first:
In [107]: df.sort_index().loc[pd.IndexSlice[:,5], ['a']]
Out[107]:
a
datetime idx
2010-10-01 04:00:00 5 3
这篇关于切片和分配值的唯一顺序索引的多索引 pandas 数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!