数据框分层索引加速 [英] Dataframe hierarchical indexing speedup

查看:43
本文介绍了数据框分层索引加速的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据框

+----+------------+------------+------------+
|    |            |    type    | payment    | 
+----+------------+------------+------------+
| id | res_number |            |            | 
+----+------------+------------+------------+
|  a |     1      |    toys    | 20000      |
|    |     2      |  clothing  | 30000      |
|    |     3      |    food    | 40000      |
|  b |     4      |    food    | 40000      |
|    |     5      |   laptop   | 30000      |
+----+------------+------------+------------+

如您所见,

id和res_number是分层的行值,类型,付款是普通的列值.我想得到的是下面的.

as you can see id, and res_number are hierachical row value, and type, payment are normal columns value. What i want to get is below.

array([['toys', 20000],
   ['clothing', 30000],
   ['food', 40000]])

无论'res_number'是什么,它都以'id(= a)'索引,我知道

It indexed by 'id(=a)' no matter what 'res_number' came, and i know that

df.loc[['a']].values

完美地为它工作.但是索引编制的速度太慢了……我必须索引150000个值.

perfectly works for it. But the speed of indexing is too slow... i have to index 150000 values.

所以我通过

df.iloc[1].values

但它只带来了

array(['toys', 20000])

在索引层次结构中是否有更快的索引方法?

is there any indexing method more faster in indexing hierarchical structure?

推荐答案

选项1
pd.DataFrame.xs

df.xs('a').values

选项2
pd.DataFrame.loc

df.loc['a'].values

选项3
pd.DataFrame.query

df.query('ilevel_0 == \'a\'').values

选项4
回旋处稍微多一点,请使用pd.MultiIndex.get_level_values创建遮罩:

Option 4
A bit more roundabout, use pd.MultiIndex.get_level_values to create a mask:

df[df.index.get_level_values(0) == 'a'].values

array([['toys', 20000],
       ['clothing', 30000],
       ['food', 40000]], dtype=object)

这篇关于数据框分层索引加速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆