使用字典替换文本数据 [英] Text data replacement using dictionary
本文介绍了使用字典替换文本数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
具有以下结构的数据框-
Dataframe with below structure -
ID text
0 Language processing in python th is great
1 Relace the string
字典名为自定义修复程序
Dictionary named custom fix
{'Relace': 'Replace', 'th' : 'three'}
尝试了代码,输出结果为- 电流输出-
Tried the code and the output is coming as - Current output -
ID text
0 Language processing in pythirdon three is great
1 Replace threee string
代码:
def multiple_replace(dict, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(map(re.escape, dict.keys())))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: dict[mo.string[mo.start():mo.end()]], text)
df['col1'] = df.apply(lambda row: multiple_replace(custom_fix, row['text']), axis=1)
预期输出-
ID text
0 Language processing in python three is great
1 Replace the string
推荐答案
我不是正则表达式专家,也许这不是最好的解决方案,但使用
正则表达式中的单词边界\b
应该可以解决问题,这里是固定功能:
I'm not an regex expert, and maybe this is not the best solution, but using
word boundaries \b
in your regex should fix the problem, here the fixed function:
def multiple_replace(d, text):
# Create a regular expression from the dictionary keys
regex = re.compile("(%s)" % "|".join(["\\b" + x + "\\b" for x in d.keys()]))
# For each match, look-up corresponding value in dictionary
return regex.sub(lambda mo: d[mo.string[mo.start():mo.end()]], text)
这篇关于使用字典替换文本数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文