正则表达式在客户的评论中用逗号替换一些点 [英] Regular Expression to replace some dots with commas in customers' comments

查看:40
本文介绍了正则表达式在客户的评论中用逗号替换一些点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要写一个正则表达式,将一些患者对药物的评论中的'.' 替换为','.他们应该在提到副作用后使用逗号,但其中一些使用了点.例如:

I need to write a Regular Expression to replace '.' with ',' in some patients' comments about drugs. They were supposed to use comma after mentioning a side effect, but some of them used dot. for example:

text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache.  I suffered. she suffered. she told I should change it."

我写了一个正则表达式代码来检测被两个点包围的一个词(例如,头痛)或两个词(例如,噩梦):

I wrote a regular expression code to detect one word (such as, headache) or two words (such as, bad dream) surround by two dots:

text=  re.sub (r'(\.)(\s*\w+\s*\.)',r',\2 ', text )

检测被两个点包围的两个单词:

text =  re.sub (r'(\.)(\s*\w+\s\w+\s*\.)',r',\2 ', text11 )

这是输出:

the drug side-effects are: night mare, nausea,  night sweat.  bad dream, dizziness,  severe headache.   I suffered, she suffered.  she told I should change it.

但应该是:

the drug side-effects are: night mare, nausea,  night sweat,  bad dream, dizziness,  severe headache.   I suffered. she suffered.  she told I should change it.

我的代码在盗汗后没有将dot替换为','.我另外,如果一个句子以主语代词开头(比如我和她)我不想把它后面的点改成逗号,即使它有两个词(比如,我遭受了).我不知道如何将此条件添加到我的代码中.

My code did not replace dot after night sweat to ','. I addition, if a sentence starts with a subject pronoun (such as I and she) I do not want to change dot to comma after it, even if it has two words (such as, I suffered). I do not know how to add this condition to my code.

有什么建议吗?谢谢!

推荐答案

您可以使用以下模式:

\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)

这匹配一个点,然后捕获一两个单词,其中第一个单词不是您提到的代词(您很可能需要扩展该列表).这必须跟一个既不是单词字符也不是空格的字符(例如 . ! : ,) 或字符串的结尾.

This matches a dot, then captures one or two words where the first one is none of your mentioned pronouns (you will need to expand that list most likely). This has to be followed by a character that is neither a word character nor a space (e.g. . ! : ,) or the end of the string.

然后您必须将其替换为 ,\1

You will then have to replace it with ,\1

在蟒蛇中

import re
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache.  I suffered. she suffered. she told I should change it."
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I)
print(text)

输出

the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache.  I suffered. she suffered. she told I should change it.

这可能不是绝对的故障安全,您可能需要针对某些边缘情况扩展模式.

This is likely not absolutely failsafe and you might have to expand the pattern for some edge cases.

这篇关于正则表达式在客户的评论中用逗号替换一些点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆