pandas 中的严格正则表达式替换 [英] Strict regex in Pandas replace

查看:95
本文介绍了 pandas 中的严格正则表达式替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要写一个严格的regular expression来替换pandas数据框中的某些值.这是在解决我发布的问题后提出的一个问题

I need to write a strict regular expression to replace certain values in my pandas dataframe. This is an issue that was raised after solving the question that I posted here.

问题在于.replace(idsToReplace, regex=True)并不严格.因此,如果iDsToReplace是:

The issue is that the .replace(idsToReplace, regex=True) is not strict. Therefore if the iDsToReplace are:

NY : New York
NYC : New York City

,替换ID的评论为:

My cat from NYC is large.

结果是:

My cat from New York is large.

pandas replace函数中是否存在Python方式使regular expression更严格地与NYC匹配,而不与NY匹配?

Is there a pythonic way within the pandas replace function to make the regular expression stricter to match with NYC and not NY?

推荐答案

\b -boundaries-b> word boundaries dict的每个键:

Add \b for word boundaries to each key of dict:

d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}

data = {'Categories': ['animal','plant','object'],
    'Type': ['tree','dog','rock'],
        'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
                    'The rock was found in LA.']
}

d = {r'\b' + k + r'\b':v for k, v in d.items()}

df = pd.DataFrame(data)

df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
  Categories                          Comment  Type  \
0     animal         The NYC tree is very big  tree   
1      plant  NY The cat from the UK is small   dog   
2     object        The rock was found in LA.  rock   

                                         commentTest  
0                 The New York City tree is very big  
1  New York The cat from the United Kingdom is small  
2                 The rock was found in Los Angeles.  

这篇关于 pandas 中的严格正则表达式替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆