在 pandas 中按标签选择多列 [英] Select multiple columns by labels in pandas

查看:47
本文介绍了在 pandas 中按标签选择多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找通过 Python 文档和论坛来选择列的方法,但是关于索引列的每个示例都过于简单化.

I've been looking around for ways to select columns through the python documentation and the forums but every example on indexing columns are too simplistic.

假设我有一个 10 x 10 的数据框

Suppose I have a 10 x 10 dataframe

df = DataFrame(randn(10, 10), index=range(0,10), columns=['A', 'B', 'C', 'D','E','F','G','H','I','J'])

到目前为止,所有的文档都只是一个简单的索引示例

So far, all the documentations gives is just a simple example of indexing like

subset = df.loc[:,'A':'C']

subset = df.loc[:,'C':]

但是当我尝试索引多个非连续列时出现错误,就像这样

But I get an error when I try index multiple, non-sequential columns, like this

subset = df.loc[:,('A':'C', 'E')]

如果我想选择列 A 到 C、E 和 G 到 I,我将如何在 Pandas 中建立索引?看来这个逻辑不行了

How would I index in Pandas if I wanted to select column A to C, E, and G to I? It appears that this logic will not work

subset = df.loc[:,('A':'C', 'E', 'G':'I')]

我觉得解决方案很简单,但我无法绕过这个错误.谢谢!

I feel that the solution is pretty simple, but I can't get around this error. Thanks!

推荐答案

Name- or Label-Based(使用正则表达式语法)

df.filter(regex='[A-CEG-I]')   # does NOT depend on the column order

请注意,这里允许使用任何正则表达式,因此这种方法可能非常通用.例如.如果您想要所有以大写或小写A"开头的列你可以使用:df.filter(regex='^[Aa]')

Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')

df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]

请注意,与基于标签的方法不同,这仅在您的列按字母顺序排序时才有效.然而,这不一定是一个问题.例如,如果您的列是 ['A','C','B'],那么您可以用 替换上面的 'A':'C'>'A':'B'.

Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B'], then you could replace 'A':'C' above with 'A':'B'.

为了完整起见,您始终可以选择@Magdalena 显示的选项,即简单地单独列出每一列,尽管随着列数的增加,它可能会更加冗长:

And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:

df[['A','B','C','E','G','H','I']]   # does NOT depend on the column order

上述任何一种方法的结果

          A         B         C         E         G         H         I
0 -0.814688 -1.060864 -0.008088  2.697203 -0.763874  1.793213 -0.019520
1  0.549824  0.269340  0.405570 -0.406695 -0.536304 -1.231051  0.058018
2  0.879230 -0.666814  1.305835  0.167621 -1.100355  0.391133  0.317467

这篇关于在 pandas 中按标签选择多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆