当一行包含另一行的字符串时,如何匹配行? [英] How to match rows when one row contain string from another row?
本文介绍了当一行包含另一行的字符串时,如何匹配行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的目的是要找到与 general_text
列中的行匹配的 City
,但匹配必须准确.
My aim is to find City
that matches row from column general_text
, but the match must be exact.
我尝试使用搜索 IN
,但是它没有给我预期的结果,所以我尝试使用 str.contain
,但是我尝试使用它向我显示一个错误.关于如何正确或有效地执行操作的任何提示?
I was trying to use searching IN
but it doesn't give me expected results, so I've tried to use str.contain
but the way I try to do it shows me an error. Any hints on how to do it properly or efficient?
我已经尝试过基于但是它给了我下面的结果:
but it gives me the result below:
data = [['palm springs john smith':'spring'],
['palm springs john smith':'palm springs'],
['palm springs john smith':'smith'],
['hamptons amagansett':'amagansett'],
['hamptons amagansett':'hampton'],
['hamptons amagansett':'gans'],
['edward riverwoods lake':'wood'],
['edward riverwoods lake':'riverwoods']]
df = pd.DataFrame(data, columns = [ 'general_text':'City'])
df['match'] = df.apply(lambda x: x['general_text'].str.contain(
x.['City']), axis = 1)
我想通过上面的代码接收到的内容仅与此匹配:
What I would like to receive by the code above is match only this:
data = [['palm springs john smith':'palm springs'],
['hamptons amagansett':'amagansett'],
['edward riverwoods lake':'riverwoods']]
推荐答案
您可以使用单词边界 \ b \ b
进行完全匹配:
You can use word boundaries \b\b
for exact match:
import re
f = lambda x: bool(re.search(r'\b{}\b'.format(x['City']), x['general_text']))
或者:
f = lambda x: bool(re.findall(r'\b{}\b'.format(x['City']), x['general_text']))
df['match'] = df.apply(f, axis = 1)
print (df)
general_text City match
0 palm springs john smith spring False
1 palm springs john smith palm springs True
2 palm springs john smith smith True
3 hamptons amagansett amagansett True
4 hamptons amagansett hampton False
5 hamptons amagansett gans False
6 edward riverwoods lake wood False
7 edward riverwoods lake riverwoods True
这篇关于当一行包含另一行的字符串时,如何匹配行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文