用字典替换字典,并带有标点符号 [英] String replacement with dictionary, complications with punctuation

查看:233
本文介绍了用字典替换字典,并带有标点符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个函数进程(s,d),通过使用字典来替换字符串中的缩写,并具有其全部含义。其中s是字符串输入,d是字典。例如:

I'm trying to write a function process(s,d) to replace abbreviations in a string with their full meaning by using a dictionary. where s is the string input and d is the dictionary. For example:

>>>d = {'ASAP':'as soon as possible'}
>>>s = "I will do this ASAP.  Regards, X"
>>>process(s,d)
>>>"I will do this as soon as possible.  Regards, X"



<并将每个部分与字典进行比较。

I have tried using the split function to separate the string and compare each part with the dictionary.

def process(s):
    return ''.join(d[ch] if ch in d else ch for ch in s)

但是,它返回我的确切串。我怀疑代码不起作用,因为原始字符串中的ASAP完全停止。如果是这样,怎么忽略标点符号,并尽快替换?

However, it returns me the same exact string. I have a suspicion that the code doesn't work because of the full stop behind ASAP in the original string. If so, how do I ignore the punctuation and get ASAP to be replaced?

推荐答案

这是一种方法,单正则表达式:

Here is a way to do it with a single regex:

In [24]: d = {'ASAP':'as soon as possible', 'AFAIK': 'as far as I know'}

In [25]: s = 'I will do this ASAP, AFAIK.  Regards, X'

In [26]: re.sub(r'\b' + '|'.join(d.keys()) + r'\b', lambda m: d[m.group(0)], s)
Out[26]: 'I will do this as soon as possible, as far as I know.  Regards, X'

与基于的版本不同str.replace(),这表示字边界,因此不会取代恰好出现在其他单词中间的缩写词(例如fetch中的etc)。

Unlike versions based on str.replace(), this observes word boundaries and therefore won't replace abbreviations that happen to appear in the middle of other words (e.g. "etc" in "fetch").

此外,与迄今为止提供的大多数(所有?)其他解决方案不同,它只是一次遍历输入字符串,而不管字典中有多少个搜索字词。

Also, unlike most (all?) other solutions presented thus far, it iterates over the input string just once, regardless of how many search terms there are in the dictionary.

这篇关于用字典替换字典,并带有标点符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆