大 pandas :按二级索引范围对MultiIndex进行切片 [英] pandas: slice a MultiIndex by range of secondary index

查看:72
本文介绍了大 pandas :按二级索引范围对MultiIndex进行切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的具有MultiIndex的系列:

I have a series with a MultiIndex like this:

import numpy as np
import pandas as pd

buckets = np.repeat(['a','b','c'], [3,5,1])
sequence = [0,1,5,0,1,2,4,50,0]

s = pd.Series(
    np.random.randn(len(sequence)), 
    index=pd.MultiIndex.from_tuples(zip(buckets, sequence))
)

# In [6]: s
# Out[6]: 
# a  0    -1.106047
#    1     1.665214
#    5     0.279190
# b  0     0.326364
#    1     0.900439
#    2    -0.653940
#    4     0.082270
#    50   -0.255482
# c  0    -0.091730

我想获取第二个索引('sequence')在2到10之间的s ['b']值.

I'd like to get the s['b'] values where the second index ('sequence') is between 2 and 10.

在第一个索引上切片可以很好地工作:

Slicing on the first index works fine:

s['a':'b']
# Out[109]: 
# bucket  value
# a       0        1.828176
#         1        0.160496
#         5        0.401985
# b       0       -1.514268
#         1       -0.973915
#         2        1.285553
#         4       -0.194625
#         5       -0.144112

但不是第二种,至少在看来最明显的两种方式上是这样的:

But not on the second, at least by what seems to be the two most obvious ways:

1)这将返回元素1至4,与索引值无关

1) This returns elements 1 through 4, with nothing to do with the index values

s['b'][1:10]

# In [61]: s['b'][1:10]
# Out[61]: 
# 1     0.900439
# 2    -0.653940
# 4     0.082270
# 50   -0.255482

但是,如果我反转索引,并且第一个索引是整数,第二个索引是字符串,则它可以工作:

However, if I reverse the index and the first index is integer and the second index is a string, it works:

In [26]: s
Out[26]: 
0   a   -0.126299
1   a    1.810928
5   a    0.571873
0   b   -0.116108
1   b   -0.712184
2   b   -1.771264
4   b    0.148961
50  b    0.089683
0   c   -0.582578

In [25]: s[0]['a':'b']
Out[25]: 
a   -0.126299
b   -0.116108

推荐答案

Robbie-Clarken答案,因为您是0.14可以在传递给loc的元组中传递切片 :

As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:

In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b  2   -0.65394
   4    0.08227
dtype: float64

实际上,您可以为每个级别传递一个切片:

Indeed, you can pass a slice for each level:

In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a  5    0.27919
b  2   -0.65394
   4    0.08227
dtype: float64

注意:切片包含在内.

您也可以使用:

s.ix[1:10, "b"]

(由于此版本允许赋值,因此最好在一个ix/loc/iloc中执行此操作.)

(It's good practice to do in a single ix/loc/iloc since this version allows assignment.)

此答案写在介绍iloc 之前,即位置/整数位置-在这种情况下可能是首选.创建它的原因是为了消除整数索引的熊猫对象的歧义,并且更具描述性:我在位置上切片".

This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location - which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: "I'm slicing on position".

s["b"].iloc[1:10]

也就是说,我有点不同意ix是的文档:

That said, I kinda disagree with the docs that ix is:

最健壮和一致的方式

most robust and consistent way

不是,最一致的方法是描述您在做什么:

it's not, the most consistent way is to describe what you're doing:

  • 使用loc作为标签
  • 使用iloc定位
  • 同时使用ix(如果确实需要)

记住禅宗的python :

显式优于隐式

explicit is better than implicit

这篇关于大 pandas :按二级索引范围对MultiIndex进行切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆