如何基于部分匹配选择DataFrame列? [英] How to select DataFrame columns based on partial matching?

查看：126 发布时间：2020/5/23 22:54:08 python pandas

本文介绍了如何基于部分匹配选择DataFrame列?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

今天下午，我正在努力寻找一种方法，通过检查其名称(标签?)中某种模式的出现来选择我的Pandas DataFrame的几列.

I was struggling this afternoon to find a way of selecting few columns of my Pandas DataFrame, by checking the occurrence of a certain pattern in their name (label?).

我一直在为nd.arrays/pd.series寻找类似contains或isin的东西，但是没有运气.

I had been looking for something like contains or isin for nd.arrays / pd.series, but got no luck.

这让我很沮丧，因为我已经在检查DataFrame的列中是否出现了特定的字符串模式，例如:

This frustrated me quite a bit, as I was already checking the columns of my DataFrame for occurrences of specific string patterns, as in:

hp = ~(df.target_column.str.contains('some_text') | df.target_column.str.contains('other_text'))
df_cln= df[hp]

但是，无论我如何敲打头，我都无法将.str.contains()应用于df.columns返回的对象-它是Index-也不应用df.columns.values返回的对象-.对于切片"操作df[column_name]返回的内容(即Series)，此方法效果很好.

However, no matter how I banged my head, I could not apply .str.contains() to the object returned bydf.columns - which is an Index - nor the one returned by df.columns.values - which is an ndarray. This works fine for what is returned by the "slicing" operation df[column_name], i.e. a Series, though.

我的第一个解决方案涉及一个for循环和一个帮助列表的创建:

My first solution involved a for loop and the creation of a help list:

ll = []
for a in df.columns:
    if a.startswith('start_exp1') | a.startswith('start_exp2'):
    ll.append(a)
df[ll]

(当然，任何人都可以应用任何str函数)

(one could apply any of the str functions, of course)

然后，我找到了map函数，并使其与以下代码一起使用:

Then, I found the map function and got it to work with the following code:

import re
sel = df.columns.map(lambda x: bool(re.search('your_regex',x))
df[df.columns[sel]]

当然，在第一个解决方案中，我可以执行相同类型的正则表达式检查，因为我可以将其应用于迭代返回的str数据类型.

Of course in the first solution I could have performed the same kind of regex checking, because I can apply it to the str data type returned by the iteration.

我对Python还是很陌生，从来没有真正编程过任何东西，所以我对速度/定时/效率不太熟悉，但是我倾向于认为第二种方法-使用地图-除了看起来更优雅之外，可能会更快到我未经训练的眼睛.

I am very new to Python and never really programmed anything so I am not too familiar with speed/timing/efficiency, but I tend to think that the second method - using a map - could potentially be faster, besides looking more elegant to my untrained eye.

我很好奇您对它的想法以及可能的替代方案.考虑到我的粗暴程度，如果您能纠正我在代码中可能犯的任何错误并为我指出正确的方向，我将不胜感激.

I am curious to know what you think of it, and what possible alternatives would be. Given my level of noobness, I would really appreciate if you could correct any mistakes I could have made in the code and point me in the right direction.

谢谢，米歇尔

编辑:我刚刚找到了Index方法Index.to_series()，该方法返回-ehm-一个我可以应用.str.contains('whatever')的Series. 但是，这不像真正的正则表达式那么强大，而且我找不到将Index.to_series().str的结果传递给re.search()函数的方法.

EDIT : I just found the Index method Index.to_series(), which returns - ehm - a Series to which I could apply .str.contains('whatever'). However, this is not quite as powerful as a true regex, and I could not find a way of passing the result of Index.to_series().str to the re.search() function..

推荐答案

您使用map的解决方案非常好.如果您确实要使用str.contains，则可以将Index对象转换为Series(具有str.contains方法):

Your solution using map is very good. If you really want to use str.contains, it is possible to convert Index objects to Series (which have the str.contains method):

In [1]: df
Out[1]: 
   x  y  z
0  0  0  0
1  1  1  1
2  2  2  2
3  3  3  3
4  4  4  4
5  5  5  5
6  6  6  6
7  7  7  7
8  8  8  8
9  9  9  9

In [2]: df.columns.to_series().str.contains('x')
Out[2]: 
x     True
y    False
z    False
dtype: bool

In [3]: df[df.columns[df.columns.to_series().str.contains('x')]]
Out[3]: 
   x
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

更新，我刚刚读了你的最后一段.从文档，str.contains默认情况下允许您传递正则表达式(str.contains('^myregex'))

UPDATE I just read your last paragraph. From the documentation, str.contains allows you to pass a regex by default (str.contains('^myregex'))

这篇关于如何基于部分匹配选择DataFrame列?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何基于部分匹配选择DataFrame列? [英] How to select DataFrame columns based on partial matching?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何基于部分匹配选择DataFrame列? [英] How to select DataFrame columns based on partial matching?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭