数据框分层索引加速 [英] Dataframe hierarchical indexing speedup
问题描述
我有这样的数据框
+----+------------+------------+------------+
| | | type | payment |
+----+------------+------------+------------+
| id | res_number | | |
+----+------------+------------+------------+
| a | 1 | toys | 20000 |
| | 2 | clothing | 30000 |
| | 3 | food | 40000 |
| b | 4 | food | 40000 |
| | 5 | laptop | 30000 |
+----+------------+------------+------------+
如您所见,
id和res_number是分层的行值,类型,付款是普通的列值.我想得到的是下面的.
as you can see id, and res_number are hierachical row value, and type, payment are normal columns value. What i want to get is below.
array([['toys', 20000],
['clothing', 30000],
['food', 40000]])
无论'res_number'是什么,它都以'id(= a)'索引,我知道
It indexed by 'id(=a)' no matter what 'res_number' came, and i know that
df.loc[['a']].values
完美地为它工作.但是索引编制的速度太慢了……我必须索引150000个值.
perfectly works for it. But the speed of indexing is too slow... i have to index 150000 values.
所以我通过
df.iloc[1].values
但它只带来了
array(['toys', 20000])
在索引层次结构中是否有更快的索引方法?
is there any indexing method more faster in indexing hierarchical structure?
推荐答案
选项1
pd.DataFrame.xs
df.xs('a').values
选项2
pd.DataFrame.loc
df.loc['a'].values
选项3
pd.DataFrame.query
df.query('ilevel_0 == \'a\'').values
选项4
回旋处稍微多一点,请使用pd.MultiIndex.get_level_values
创建遮罩:
Option 4
A bit more roundabout, use pd.MultiIndex.get_level_values
to create a mask:
df[df.index.get_level_values(0) == 'a'].values
array([['toys', 20000],
['clothing', 30000],
['food', 40000]], dtype=object)
这篇关于数据框分层索引加速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!