从python pandas的dataframe列中搜索匹配的字符串模式 [英] searching matching string pattern from dataframe column in python pandas

查看:1561
本文介绍了从python pandas的dataframe列中搜索匹配的字符串模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框如下所示

 name         genre
 satya      |ACTION|DRAMA|IC|
 satya      |COMEDY|BIOPIC|SOCIAL|
 abc        |CLASSICAL|
 xyz        |ROMANCE|ACTION|DARMA|
 def        |DISCOVERY|SPORT|COMEDY|IC|
 ghj        |IC|

现在,我想查询数据帧,以便获得第1,5和6.i行:我想找到| IC |.单独使用或与其他类型任意组合使用.

Now I want to query the dataframe so that i can get row 1,5 and 6.i:e i want to find |IC| with alone or with any combination of other genres.

到目前为止,我可以使用进行精确搜索

Upto now i am able to do either a exact search using

df[df['genre'] == '|ACTION|DRAMA|IC|']  ######exact value yields row 1

或包含搜索依据的字符串

or a string contains search by

 df[df['genre'].str.contains('IC')]  ####yields row 1,2,3,5,6
 # as BIOPIC has IC in that same for CLASSICAL also

但是我不要这两个.

#df[df['genre'].str.contains('|IC|')]  #### row 6
# This also not satisfying my need as i am missing rows 1 and 5

所以我的要求是找到具有| IC |的类型(我的字符串搜索失败,因为python将'|'视为or运算符)

So my requirement is to find genres having |IC| in them.(My string search fails because python treats '|' as or operator)

有人建议使用某些reg或任何方法.感谢ADv.

Somebody suggest some reg or any method to do that.Thanks in ADv.

推荐答案

我认为您可以将\添加到正则表达式中以进行转义,因为没有\|被解释为

I think you can add \ to regex for escaping , because | without \ is interpreted as OR:

'|'

A | B,其中A和B可以是任意RE,它创建一个匹配A或B的正则表达式.任意数量的RE可以由'|'分隔.这样.也可以在组内使用(请参阅下文).扫描目标字符串时,RE用"|"分隔从左到右尝试.当一个模式完全匹配时,该分支被接受.这意味着,一旦A匹配,即使将产生更长的整体匹配,也不会对其进行进一步测试.换句话说,"|"操作员从不贪婪.要匹配文字"|",请使用\ |,或将其括在字符类中,如[|]所示.

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B. An arbitrary number of REs can be separated by the '|' in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy. To match a literal '|', use \|, or enclose it inside a character class, as in [|].

print df['genre'].str.contains(u'\|IC\|')
0     True
1    False
2    False
3    False
4     True
5     True
Name: genre, dtype: bool

print df[df['genre'].str.contains(u'\|IC\|')]
    name                        genre
0  satya            |ACTION|DRAMA|IC|
4    def  |DISCOVERY|SPORT|COMEDY|IC|
5    ghj                         |IC|

这篇关于从python pandas的dataframe列中搜索匹配的字符串模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆