pandas python中字符串的完全匹配 [英] Exact match of string in pandas python
问题描述
我在数据框中有一个列,例如df:
I have a column in data frame which ex df:
A
0 Good to 1. Good communication EI : tathagata.kar@ae.com
1 SAP ECC Project System EI: ram.vaddadi@ae.com
2 EI : ravikumar.swarna Role:SSE Minimum Skill
我有一个字符串列表
ls=['tathagata.kar@ae.com','a.kar@ae.com']
现在,如果我想过滤掉
for i in range(len(ls)):
df1=df[df['A'].str.contains(ls[i])
if len(df1.columns!=0):
print ls[i]
我得到了输出
tathagata.kar@ae.com
a.kar@ae.com
但是我只需要tathagata.kar@ae.com
如何实现? 如您所见,我已经尝试过 str.contains ,但是我需要一些用于精确匹配的内容
How Can It be achieved? As you can see I've tried str.contains But I need something for extact match
推荐答案
您可以简单地使用==
You could simply use ==
string_a == string_b
如果两个字符串相等,则应返回True.但这不能解决您的问题.
It should return True if the two strings are equal. But this does not solve your issue.
您应该使用len(df1.index)而不是len(df1.columns).实际上,len(df1.columns)会为您提供列数,而不是行数.
Edit 2: You should use len(df1.index) instead of len(df1.columns). Indeed, len(df1.columns) will give you the number of columns, and not the number of rows.
阅读您的第二篇文章后,我已经了解了您的问题.您提出的解决方案可能会导致一些错误. 例如,如果您有:
Edit 3: After reading your second post, I've understood your problem. The solution you propose could lead to some errors. For instance, if you have:
ls=['tathagata.kar@ae.com','a.kar@ae.com', 'tathagata.kar@ae.co']
第一个和第三个元素将匹配str.contains(r'(?:\ s | ^ | Ei:| EI:| EI-)'+ ls [i]) 这是不想要的行为.
the first and the third element will match str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]) And this is an unwanted behaviour.
您可以在字符串的末尾添加一个检查:str.contains(r'(?:\ s | ^ | Ei:| EI:| EI-)'+ ls [i] + r'(?: \ s | $)')
You could add a check on the end of the string: str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')
赞:
for i in range(len(ls)):
df1 = df[df['A'].str.contains(r'(?:\s|^|Ei:|EI:|EI-)'+ls[i]+r'(?:\s|$)')]
if len(df1.index != 0):
print (ls[i])
(如果使用python 2.7,请删除打印"中的括号)
(Remove parenthesis in the "print" if you use python 2.7)
这篇关于 pandas python中字符串的完全匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!