在 pandas MultiIndex DataFrame中选择行 [英] Select rows in pandas MultiIndex DataFrame

查看:86
本文介绍了在 pandas MultiIndex DataFrame中选择行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

选择/过滤索引为MultiIndex的数据框?

What are the most common pandas ways to select/filter rows of a dataframe whose index is a MultiIndex?

  • 基于单个值/标签的切片
  • 基于一个或多个级别的多个标签进行切片
  • 过滤布尔条件和表达式
  • 哪种方法在什么情况下适用

为简单起见的假设:

  1. 输入数据框没有重复的索引键
  2. 下面的
  3. 输入数据框只有两个级别. (此处显示的大多数解决方案都可以推广到N级)


示例输入:

mux = pd.MultiIndex.from_arrays([
    list('aaaabbbbbccddddd'),
    list('tuvwtuvwtuvwtuvw')
], names=['one', 'two'])

df = pd.DataFrame({'col': np.arange(len(mux))}, mux)

         col
one two     
a   t      0
    u      1
    v      2
    w      3
b   t      4
    u      5
    v      6
    w      7
    t      8
c   u      9
    v     10
d   w     11
    t     12
    u     13
    v     14
    w     15

问题1:选择单个项目

如何选择在一个"级别中具有"a"的行?

Question 1: Selecting a Single Item

How do I select rows having "a" in level "one"?

         col
one two     
a   t      0
    u      1
    v      2
    w      3

此外,我如何在输出中将级别"one"降低?

Additionally, how would I be able to drop level "one" in the output?

     col
two     
t      0
u      1
v      2
w      3

问题1b
如何在级别"two"上切片所有值为"t"的行?

Question 1b
How do I slice all rows with value "t" on level "two"?

         col
one two     
a   t      0
b   t      4
    t      8
d   t     12

问题2:在一个级别中选择多个值

如何在级别"one"中选择与项目"b"和"d"相对应的行?

Question 2: Selecting Multiple Values in a Level

How can I select rows corresponding to items "b" and "d" in level "one"?

         col
one two     
b   t      4
    u      5
    v      6
    w      7
    t      8
d   w     11
    t     12
    u     13
    v     14
    w     15

问题2b
我如何获得与第二"级别中的"t"和"w"相对应的所有值?

Question 2b
How would I get all values corresponding to "t" and "w" in level "two"?

         col
one two     
a   t      0
    w      3
b   t      4
    w      7
    t      8
d   w     11
    t     12
    w     15

问题3:切片单个横截面(x, y)

如何检索横截面,即从df获取具有特定索引值的单行?具体来说,如何获取('c', 'u')的横截面,由

Question 3: Slicing a Single Cross Section (x, y)

How do I retrieve a cross section, i.e., a single row having a specific values for the index from df? Specifically, how do I retrieve the cross section of ('c', 'u'), given by

         col
one two     
c   u      9

问题4:切片多个横截面[(a, b), (c, d), ...]

如何选择与('c', 'u')('a', 'w')相对应的两行?

Question 4: Slicing Multiple Cross Sections [(a, b), (c, d), ...]

How do I select the two rows corresponding to ('c', 'u'), and ('a', 'w')?

         col
one two     
c   u      9
a   w      3

问题5:每个级别切成一个项目

如何检索与一个"级别中的"a"或第二"级别中的"t"相对应的所有行?

Question 5: One Item Sliced per Level

How can I retrieve all rows corresponding to "a" in level "one" or "t" in level "two"?

         col
one two     
a   t      0
    u      1
    v      2
    w      3
b   t      4
    t      8
d   t     12

问题6:任意切片

如何切片特定的横截面?对于"a"和"b",我想选择所有具有子级别"u"和"v"的行,对于"d",我想选择具有子级别"w"的行.

         col
one two     
a   u      1
    v      2
b   u      5
    v      6
d   w     11
    w     15

问题7将使用由数字级别组成的独特设置:

Question 7 will use a unique setup consisting of a numeric level:

np.random.seed(0)
mux2 = pd.MultiIndex.from_arrays([
    list('aaaabbbbbccddddd'),
    np.random.choice(10, size=16)
], names=['one', 'two'])

df2 = pd.DataFrame({'col': np.arange(len(mux2))}, mux2)

         col
one two     
a   5      0
    0      1
    3      2
    3      3
b   7      4
    9      5
    3      6
    5      7
    2      8
c   4      9
    7     10
d   6     11
    8     12
    8     13
    1     14
    6     15

问题7:按数字不等式在多级索引的各个级别上进行过滤

如何获取二级"中的值大于5的所有行?

Question 7: Filtering by numeric inequality on individual levels of the multiindex

How do I get all rows where values in level "two" are greater than 5?

         col
one two     
b   7      4
    9      5
c   7     10
d   6     11
    8     12
    8     13
    6     15


注意:这篇文章将介绍如何创建MultiIndexes,如何对它们执行赋值操作或任何与性能相关的讨论(这些是下次的单独主题).


Note: This post will not go through how to create MultiIndexes, how to perform assignment operations on them, or any performance related discussions (these are separate topics for another time).

推荐答案

多索引/高级索引

注意
该帖子的结构如下:

Note
This post will be structured in the following manner:

  1. OP中提出的问题将被一一解决
  2. 对于每个问题,将演示一种或多种适用于解决该问题并获得预期结果的方法.

注意(非常类似于此内容)将向有兴趣学习其他功能,实现细节, 以及其他有关手头主题的信息.这些笔记已经 通过搜索文档并发现各种晦涩难懂的地方进行编译 功能,以及我自己的经验(绝对有限).

Notes (much like this one) will be included for readers interested in learning about additional functionality, implementation details, and other info cursory to the topic at hand. These notes have been compiled through scouring the docs and uncovering various obscure features, and from my own (admittedly limited) experience.

所有代码示例均已在 pandas v0.23.4,python3.7 上创建并测试.如果不清楚或事实不正确,或者您没有 找到适用于您的用例的解决方案,请随时 提出修改建议,在注释中要求澄清或打开新的 问题,……(如果适用).

All code samples have created and tested on pandas v0.23.4, python3.7. If something is not clear, or factually incorrect, or if you did not find a solution applicable to your use case, please feel free to suggest an edit, request clarification in the comments, or open a new question, ....as applicable.

这里是一些常见习语(以下简称四个习语")的介绍,我们将经常对其进行复习.

Here is an introduction to some common idioms (henceforth referred to as the Four Idioms) we will be frequently re-visiting

  1. DataFrame.loc -按标签选择的一般解决方案(+

  1. DataFrame.loc - A general solution for selection by label (+ pd.IndexSlice for more complex applications involving slices)

DataFrame.xs -从Series/DataFrame中提取特定横截面.

DataFrame.xs - Extract a particular cross section from a Series/DataFrame.

DataFrame.query -动态指定切片和/或过滤操作(即,作为动态评估的表达式.比其他情况更适用于某些方案.另请参见

DataFrame.query - Specify slicing and/or filtering operations dynamically (i.e., as an expression that is evaluated dynamically. Is more applicable to some scenarios than others. Also see this section of the docs for querying on MultiIndexes.

布尔型索引,其掩码使用

Boolean indexing with a mask generated using MultiIndex.get_level_values (often in conjunction with Index.isin, especially when filtering with multiple values). This is also quite useful in some circumstances.

从四个成语的角度来看各种切片和过滤问题,将有助于更好地理解可应用于给定情况的内容,这将是有益的.非常重要的一点是要了解,并非所有习惯用法在每种情况下都一样有效(如果有的话).如果没有将成语列为以下问题的潜在解决方案,则意味着该成语不能有效地应用于该问题.

It will be beneficial to look at the various slicing and filtering problems in terms of the Four Idioms to gain a better understanding what can be applied to a given situation. It is very important to understand that not all of the idioms will work equally well (if at all) in every circumstance. If an idiom has not been listed as a potential solution to a problem below, that means that idiom cannot be applied to that problem effectively.

问题1

如何选择在一个"级别中具有"a"的行?

Question 1

How do I select rows having "a" in level "one"?

         col
one two     
a   t      0
    u      1
    v      2
    w      3

您可以使用loc作为适用于大多数情况的通用解决方案:

You can use loc, as a general purpose solution applicable to most situations:

df.loc[['a']]

在这一点上,如果您得到

At this point, if you get

TypeError: Expected tuple, got str

这意味着您使用的是旧版熊猫.考虑升级!否则,请使用df.loc[('a', slice(None)), :].

That means you're using an older version of pandas. Consider upgrading! Otherwise, use df.loc[('a', slice(None)), :].

或者,您可以在这里使用xs,因为我们要提取单个横截面.注意levelsaxis自变量(此处可以采用合理的默认值).

Alternatively, you can use xs here, since we are extracting a single cross section. Note the levels and axis arguments (reasonable defaults can be assumed here).

df.xs('a', level=0, axis=0, drop_level=False)
# df.xs('a', drop_level=False)

在这里,需要使用drop_level=False自变量来防止xs在结果中降低一"级(我们在其上切片的水平).

Here, the drop_level=False argument is needed to prevent xs from dropping level "one" in the result (the level we sliced on).

这里的另一个选择是使用query:

Yet another option here is using query:

df.query("one == 'a'")

如果索引没有名称,则需要将查询字符串更改为"ilevel_0 == 'a'".

If the index did not have a name, you would need to change your query string to be "ilevel_0 == 'a'".

最后,使用get_level_values:

df[df.index.get_level_values('one') == 'a']
# If your levels are unnamed, or if you need to select by position (not label),
# df[df.index.get_level_values(0) == 'a']

此外,我如何在输出中将级别"one"降低?

Additionally, how would I be able to drop level "one" in the output?

     col
two     
t      0
u      1
v      2
w      3

使用任一方法都可以轻松完成

This can be easily done using either

df.loc['a'] # Notice the single string argument instead the list.

或者,

df.xs('a', level=0, axis=0, drop_level=True)
# df.xs('a')

请注意,我们可以省略drop_level参数(默认情况下假定为True).

Notice that we can omit the drop_level argument (it is assumed to be True by default).

注意
您可能会注意到,经过过滤的DataFrame可能仍然具有所有级别,即使在打印出DataFrame时不显示这些级别也是如此.例如,

Note
You may notice that a filtered DataFrame may still have all the levels, even if they do not show when printing the DataFrame out. For example,

v = df.loc[['a']]
print(v)
         col
one two     
a   t      0
    u      1
    v      2
    w      3

print(v.index)
MultiIndex(levels=[['a', 'b', 'c', 'd'], ['t', 'u', 'v', 'w']],
           labels=[[0, 0, 0, 0], [0, 1, 2, 3]],
           names=['one', 'two'])

您可以使用 MultiIndex.remove_unused_levels :

v.index = v.index.remove_unused_levels()

print(v.index)
MultiIndex(levels=[['a'], ['t', 'u', 'v', 'w']],
           labels=[[0, 0, 0, 0], [0, 1, 2, 3]],
           names=['one', 'two'])


问题1b

如何在级别"two"上切片所有值为"t"的行?

Question 1b

How do I slice all rows with value "t" on level "two"?

         col
one two     
a   t      0
b   t      4
    t      8
d   t     12

直觉上,您需要涉及 slice() :

Intuitively, you would want something involving slice():

df.loc[(slice(None), 't'), :]

It Just Works!™,但它笨拙.我们可以在这里使用pd.IndexSlice API促进更自然的切片语法.

It Just Works!™ But it is clunky. We can facilitate a more natural slicing syntax using the pd.IndexSlice API here.

idx = pd.IndexSlice
df.loc[idx[:, 't'], :]

这要干净得多.

注意
为什么需要跨列的尾随切片:?这是因为loc可用于沿两个轴选择和切片(axis=0axis=1).没有明确说明切片的轴 完成后,该操作将变得模棱两可.请参见切片文档中的红色大框.

Note
Why is the trailing slice : across the columns required? This is because, loc can be used to select and slice along both axes (axis=0 or axis=1). Without explicitly making it clear which axis the slicing is to be done on, the operation becomes ambiguous. See the big red box in the documentation on slicing.

如果要消除任何歧义,loc接受axis 参数:

If you want to remove any shade of ambiguity, loc accepts an axis parameter:

df.loc(axis=0)[pd.IndexSlice[:, 't']]

在没有axis参数的情况下(即仅通过执行df.loc[pd.IndexSlice[:, 't']]),假定切片位于列上, 在这种情况下,将引发KeyError.

Without the axis parameter (i.e., just by doing df.loc[pd.IndexSlice[:, 't']]), slicing is assumed to be on the columns, and a KeyError will be raised in this circumstance.

此文档记录在切片器中.但是,出于本文的目的,我们将明确指定所有轴.

This is documented in slicers. For the purpose of this post, however, we will explicitly specify all axes.

对于xs,它是

df.xs('t', axis=0, level=1, drop_level=False)

对于query,它是

df.query("two == 't'")
# Or, if the first level has no name, 
# df.query("ilevel_1 == 't'") 

最后,使用get_level_values,您可以

df[df.index.get_level_values('two') == 't']
# Or, to perform selection by position/integer,
# df[df.index.get_level_values(1) == 't']

所有效果都相同.

问题2

如何在级别"one"中选择与项目"b"和"d"相对应的行?

Question 2

How can I select rows corresponding to items "b" and "d" in level "one"?

         col
one two     
b   t      4
    u      5
    v      6
    w      7
    t      8
d   w     11
    t     12
    u     13
    v     14
    w     15

使用loc,通过指定列表以类似的方式完成.

Using loc, this is done in a similar fashion by specifying a list.

df.loc[['b', 'd']]

要解决上述选择"b"和"d"的问题,也可以使用query:

To solve the above problem of selecting "b" and "d", you can also use query:

items = ['b', 'd']
df.query("one in @items")
# df.query("one == @items", parser='pandas')
# df.query("one in ['b', 'd']")
# df.query("one == ['b', 'd']", parser='pandas')

注意
是的,默认解析器为'pandas',但重要的是要突出此语法不是传统上的python.这 Pandas解析器生成的解析树与 表达.这样做是为了使某些操作更直观 指定.有关更多信息,请阅读我的文章 使用pd.eval()在熊猫中进行动态表达评估.

Note
Yes, the default parser is 'pandas', but it is important to highlight this syntax isn't conventionally python. The Pandas parser generates a slightly different parse tree from the expression. This is done to make some operations more intuitive to specify. For more information, please read my post on Dynamic Expression Evaluation in pandas using pd.eval().

然后,使用get_level_values + Index.isin:

df[df.index.get_level_values("one").isin(['b', 'd'])]


问题2b

我如何获得与两个"级别中的"t"和"w"相对应的所有值?

Question 2b

How would I get all values corresponding to "t" and "w" in level "two"?

         col
one two     
a   t      0
    w      3
b   t      4
    w      7
    t      8
d   w     11
    t     12
    w     15

对于loc,这可能与pd.IndexSlice结合使用,只能 .

With loc, this is possible only in conjuction with pd.IndexSlice.

df.loc[pd.IndexSlice[:, ['t', 'w']], :] 

pd.IndexSlice[:, ['t', 'w']]中的第一个冒号:表示跨越第一级.随着要查询的级别的深度增加,您将需要指定更多的切片,每个级别将切片一个.但是,您将不需要指定更多的级别 .

The first colon : in pd.IndexSlice[:, ['t', 'w']] means to slice across the first level. As the depth of the level being queried increases, you will need to specify more slices, one per level being sliced across. You will not need to specify more levels beyond the one being sliced, however.

对于query,这是

items = ['t', 'w']
df.query("two in @items")
# df.query("two == @items", parser='pandas') 
# df.query("two in ['t', 'w']")
# df.query("two == ['t', 'w']", parser='pandas')

带有get_level_valuesIndex.isin(与上面类似):

With get_level_values and Index.isin (similar to above):

df[df.index.get_level_values('two').isin(['t', 'w'])]


问题3

如何获取横截面,即具有特定值的单行 df的索引?具体来说,我如何取回十字架 ('c', 'u')部分,由

Question 3

How do I retrieve a cross section, i.e., a single row having a specific values for the index from df? Specifically, how do I retrieve the cross section of ('c', 'u'), given by

         col
one two     
c   u      9

通过指定键的元组来使用loc:

Use loc by specifying a tuple of keys:

df.loc[('c', 'u'), :]

或者,

df.loc[pd.IndexSlice[('c', 'u')]]

注意
此时,您可能会遇到

Note
At this point, you may run into a PerformanceWarning that looks like this:

PerformanceWarning: indexing past lexsort depth may impact performance.

这仅表示您的索引未排序.大熊猫取决于要进行最佳搜索和检索的索引(在这种情况下,按字典顺序排序,因为我们正在处理字符串值).快速解决方法是对您的邮件进行排序 预先使用 DataFrame.sort_index .从性能的角度来看,如果您打算这样做,这是特别理想的 多个这样的查询串联在一起:

This just means that your index is not sorted. pandas depends on the index being sorted (in this case, lexicographically, since we are dealing with string values) for optimal search and retrieval. A quick fix would be to sort your DataFrame in advance using DataFrame.sort_index. This is especially desirable from a performance standpoint if you plan on doing multiple such queries in tandem:

df_sort = df.sort_index()
df_sort.loc[('c', 'u')]

您还可以使用 MultiIndex.is_lexsorted() 检查索引 是否排序.该函数相应地返回TrueFalse. 您可以调用此函数来确定是否进行了其他排序 步骤是否必需.

You can also use MultiIndex.is_lexsorted() to check whether the index is sorted or not. This function returns True or False accordingly. You can call this function to determine whether an additional sorting step is required or not.

使用xs,这再次简单地传递了一个元组作为第一个参数,而所有其他参数均设置为其适当的默认值:

With xs, this is again simply passing a single tuple as the first argument, with all other arguments set to their appropriate defaults:

df.xs(('c', 'u'))

使用query,事情变得有些笨拙:

With query, things become a bit clunky:

df.query("one == 'c' and two == 'u'")

您现在可以看到,将很难一概而论.但是对于这个特定问题还是可以的.

You can see now that this is going to be relatively difficult to generalize. But is still OK for this particular problem.

访问跨越多个级别,仍可以使用get_level_values,但不建议使用:

With accesses spanning multiple levels, get_level_values can still be used, but is not recommended:

m1 = (df.index.get_level_values('one') == 'c')
m2 = (df.index.get_level_values('two') == 'u')
df[m1 & m2]


问题4

如何选择与('c', 'u')('a', 'w')相对应的两行?

Question 4

How do I select the two rows corresponding to ('c', 'u'), and ('a', 'w')?

         col
one two     
c   u      9
a   w      3

对于loc,这仍然很简单:

df.loc[[('c', 'u'), ('a', 'w')]]
# df.loc[pd.IndexSlice[[('c', 'u'), ('a', 'w')]]]

使用query,您将需要通过遍历横截面和层次来动态生成查询字符串:

With query, you will need to dynamically generate a query string by iterating over your cross sections and levels:

cses = [('c', 'u'), ('a', 'w')]
levels = ['one', 'two']
# This is a useful check to make in advance.
assert all(len(levels) == len(cs) for cs in cses) 

query = '(' + ') or ('.join([
    ' and '.join([f"({l} == {repr(c)})" for l, c in zip(levels, cs)]) 
    for cs in cses
]) + ')'

print(query)
# ((one == 'c') and (two == 'u')) or ((one == 'a') and (two == 'w'))

df.query(query)

100%不推荐!但是有可能.

100% DO NOT RECOMMEND! But it is possible.

问题5

如何在级别一个"中检索与"a"相对应的所有行,或者 等级二"中的"t"?

Question 5

How can I retrieve all rows corresponding to "a" in level "one" or "t" in level "two"?

         col
one two     
a   t      0
    u      1
    v      2
    w      3
b   t      4
    t      8
d   t     12

使用loc实际上很难做到,同时确保的正确性,同时仍保持代码的清晰度. df.loc[pd.IndexSlice['a', 't']]不正确,它被解释为df.loc[pd.IndexSlice[('a', 't')]](即选择横截面).您可能会想到使用pd.concat来分别处理每个标签的解决方案:

This is actually very difficult to do with loc while ensuring correctness and still maintaining code clarity. df.loc[pd.IndexSlice['a', 't']] is incorrect, it is interpreted as df.loc[pd.IndexSlice[('a', 't')]] (i.e., selecting a cross section). You may think of a solution with pd.concat to handle each label separately:

pd.concat([
    df.loc[['a'],:], df.loc[pd.IndexSlice[:, 't'],:]
])

         col
one two     
a   t      0
    u      1
    v      2
    w      3
    t      0   # Does this look right to you? No, it isn't!
b   t      4
    t      8
d   t     12

但是您会注意到其中一行是重复的.这是因为该行同时满足两个切片条件,因此出现了两次.相反,您需要这样做

But you'll notice one of the rows is duplicated. This is because that row satisfied both slicing conditions, and so appeared twice. You will instead need to do

v = pd.concat([
        df.loc[['a'],:], df.loc[pd.IndexSlice[:, 't'],:]
])
v[~v.index.duplicated()]

但是,如果您的DataFrame固有地包含重复的索引(您想要的),那么它将不会保留它们. 使用时要格外小心.

But if your DataFrame inherently contains duplicate indices (that you want), then this will not retain them. Use with extreme caution.

使用query,这非常简单:

df.query("one == 'a' or two == 't'")

对于get_level_values,这仍然很简单,但不那么优雅:

With get_level_values, this is still simple, but not as elegant:

m1 = (df.index.get_level_values('one') == 'a')
m2 = (df.index.get_level_values('two') == 't')
df[m1 | m2] 


问题6

如何切片特定的横截面?对于"a"和"b",我想选择子级别为"u"和"v"的所有行,并且 对于"d",我想选择子级为"w"的行.

Question 6

How can I slice specific cross sections? For "a" and "b", I would like to select all rows with sub-levels "u" and "v", and for "d", I would like to select rows with sub-level "w".

         col
one two     
a   u      1
    v      2
b   u      5
    v      6
d   w     11
    w     15

我添加了这是一个特殊情况,以帮助理解四个惯用语的用法–这是其中一个都不有效的情况,因为切片是非常特有的,并且不遵循任何实际模式.

This is a special case that I've added to help understand the applicability of the Four Idioms—this is one case where none of them will work effectively, since the slicing is very specific, and does not follow any real pattern.

通常,像这样的切片问题将需要将键列表显式传递给loc.一种实现方法是:

Usually, slicing problems like this will require explicitly passing a list of keys to loc. One way of doing this is with:

keys = [('a', 'u'), ('a', 'v'), ('b', 'u'), ('b', 'v'), ('d', 'w')]
df.loc[keys, :]

如果要保存一些键入内容,您将认识到存在一种切片"a","b"及其子级别的模式,因此我们可以将切片任务分为两部分,并concat结果: /p>

If you want to save some typing, you will recognise that there is a pattern to slicing "a", "b" and its sublevels, so we can separate the slicing task into two portions and concat the result:

pd.concat([
     df.loc[(('a', 'b'), ('u', 'v')), :], 
     df.loc[('d', 'w'), :]
   ], axis=0)

"a"和"b"的切片规范稍微清晰一些(('a', 'b'), ('u', 'v')),因为被索引的相同子级别对于每个级别都是相同的.

Slicing specification for "a" and "b" is slightly cleaner (('a', 'b'), ('u', 'v')) because the same sub-levels being indexed are the same for each level.

问题7

如何获取二级"中的值大于5的所有行?

Question 7

How do I get all rows where values in level "two" are greater than 5?

         col
one two     
b   7      4
    9      5
c   7     10
d   6     11
    8     12
    8     13
    6     15

这可以使用query

df2.query("two > 5")

get_level_values.

df2[df2.index.get_level_values('two') > 5]

注意
与此示例类似,我们可以使用这些构造基于任意条件进行过滤.通常,记住locxs是专门用于基于标签的索引的,而queryquery是专门用于基于标签的索引的. get_level_values对于构建通用条件掩码很有帮助 进行过滤.

Note
Similar to this example, we can filter based on any arbitrary condition using these constructs. In general, it is useful to remember that loc and xs are specifically for label-based indexing, while query and get_level_values are helpful for building general conditional masks for filtering.


奖金问题

如果我需要切片MultiIndex 怎么办?

Bonus Question

What if I need to slice a MultiIndex column?

实际上,此处的大多数解决方案也适用于色谱柱,只需稍作更改即可.考虑:

Actually, most solutions here are applicable to columns as well, with minor changes. Consider:

np.random.seed(0)
mux3 = pd.MultiIndex.from_product([
        list('ABCD'), list('efgh')
], names=['one','two'])

df3 = pd.DataFrame(np.random.choice(10, (3, len(mux))), columns=mux3)
print(df3)

one  A           B           C           D         
two  e  f  g  h  e  f  g  h  e  f  g  h  e  f  g  h
0    5  0  3  3  7  9  3  5  2  4  7  6  8  8  1  6
1    7  7  8  1  5  9  8  9  4  3  0  3  5  0  2  3
2    8  1  3  3  3  7  0  1  9  9  0  4  7  3  2  7

您需要对四个惯用语"进行以下更改,以使它们与列配合使用.

These are the following changes you will need to make to the Four Idioms to have them working with columns.

  1. 要使用loc切片,请使用

df3.loc[:, ....] # Notice how we slice across the index with `:`. 

df3.loc[:, pd.IndexSlice[...]]

  • 要适当使用xs,只需传递参数axis=1.

  • To use xs as appropriate, just pass an argument axis=1.

    您可以直接使用df.columns.get_level_values访问列级别的值.然后,您将需要执行

    You can access the column level values directly using df.columns.get_level_values. You will then need to do something like

    df.loc[:, {condition}] 
    

    其中{condition}表示使用columns.get_level_values构建的某些条件.

    Where {condition} represents some condition built using columns.get_level_values.

    要使用query,您唯一的选择是转置,查询索引并再次转置:

    To use query, your only option is to transpose, query on the index, and transpose again:

    df3.T.query(...).T
    

    不建议使用其他三个选项之一.

    Not recommended, use one of the other 3 options.

    这篇关于在 pandas MultiIndex DataFrame中选择行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆