pandas 中的严格正则表达式替换 [英] Strict regex in Pandas replace
问题描述
我需要写一个严格的regular expression
来替换pandas
数据框中的某些值.这是在解决我发布的问题后提出的一个问题
I need to write a strict regular expression
to replace certain values in my pandas
dataframe. This is an issue that was raised after solving the question that I posted here.
问题在于.replace(idsToReplace, regex=True)
并不严格.因此,如果iDsToReplace是:
The issue is that the .replace(idsToReplace, regex=True)
is not strict. Therefore if the iDsToReplace are:
NY : New York
NYC : New York City
,替换ID的评论为:
My cat from NYC is large.
结果是:
My cat from New York is large.
pandas
replace
函数中是否存在Python方式使regular expression
更严格地与NYC
匹配,而不与NY
匹配?
Is there a pythonic way within the pandas
replace
function to make the regular expression
stricter to match with NYC
and not NY
?
推荐答案
为word boundaries
到dict
的每个键:
Add \b
for word boundaries
to each key of dict
:
d = {'UK': 'United Kingdom', 'LA': 'Los Angeles', 'NYC': 'New York City', 'NY' : 'New York'}
data = {'Categories': ['animal','plant','object'],
'Type': ['tree','dog','rock'],
'Comment': ['The NYC tree is very big', 'NY The cat from the UK is small',
'The rock was found in LA.']
}
d = {r'\b' + k + r'\b':v for k, v in d.items()}
df = pd.DataFrame(data)
df['commentTest'] = df['Comment'].replace(d, regex=True)
print (df)
Categories Comment Type \
0 animal The NYC tree is very big tree
1 plant NY The cat from the UK is small dog
2 object The rock was found in LA. rock
commentTest
0 The New York City tree is very big
1 New York The cat from the United Kingdom is small
2 The rock was found in Los Angeles.
这篇关于 pandas 中的严格正则表达式替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!