用字典替换字典,并带有标点符号 [英] String replacement with dictionary, complications with punctuation
问题描述
我正在尝试编写一个函数进程(s,d),通过使用字典来替换字符串中的缩写,并具有其全部含义。其中s是字符串输入,d是字典。例如:
I'm trying to write a function process(s,d) to replace abbreviations in a string with their full meaning by using a dictionary. where s is the string input and d is the dictionary. For example:
>>>d = {'ASAP':'as soon as possible'}
>>>s = "I will do this ASAP. Regards, X"
>>>process(s,d)
>>>"I will do this as soon as possible. Regards, X"
<并将每个部分与字典进行比较。
I have tried using the split function to separate the string and compare each part with the dictionary.
def process(s):
return ''.join(d[ch] if ch in d else ch for ch in s)
但是,它返回我的确切串。我怀疑代码不起作用,因为原始字符串中的ASAP完全停止。如果是这样,怎么忽略标点符号,并尽快替换?
However, it returns me the same exact string. I have a suspicion that the code doesn't work because of the full stop behind ASAP in the original string. If so, how do I ignore the punctuation and get ASAP to be replaced?
推荐答案
这是一种方法,单正则表达式:
Here is a way to do it with a single regex:
In [24]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}
In [25]: s = 'I will do this ASAP, AFAIK. Regards, X'
In [26]: re.sub(r'\b' + '|'.join(d.keys()) + r'\b', lambda m: d[m.group(0)], s)
Out[26]: 'I will do this as soon as possible, as far as I know. Regards, X'
与基于的版本不同str.replace()
,这表示字边界,因此不会取代恰好出现在其他单词中间的缩写词(例如fetch中的etc)。
Unlike versions based on str.replace()
, this observes word boundaries and therefore won't replace abbreviations that happen to appear in the middle of other words (e.g. "etc" in "fetch").
此外,与迄今为止提供的大多数(所有?)其他解决方案不同,它只是一次遍历输入字符串,而不管字典中有多少个搜索字词。
Also, unlike most (all?) other solutions presented thus far, it iterates over the input string just once, regardless of how many search terms there are in the dictionary.
这篇关于用字典替换字典,并带有标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!