Python:使用re.sub替换列表中的多个特定单词 [英] Python: Replacing multiple specific words from a list with re.sub

查看:1953
本文介绍了Python:使用re.sub替换列表中的多个特定单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下字符串并列出"changewords".我想用'{word from list}替换'{word from list} \ n':'我不想替换'\ n'的所有实例.

 string = "Foo \n value of something \n Bar \n Another value \n"
changewords = ["Foo", "Bar"]
 

所需的输出:

 'Foo: value of something \n Bar: Another value \n'
 

我尝试了以下

 for i in changewords:
    tem = re.sub(f'{i} \n', f'{i}:', string)
tem
Output: 'Foo \n value of something \n Bar: Another value \n'
 

 changewords2 = '|'.join(changewords)
tem = re.sub(f'{changewords2} \n', f'{changewords2}:', string)
tem
Output: 'Foo|Bar: \n value of something \n Foo|Bar: Another value \n'
 

如何获得所需的输出?

解决方案

使用替换字符串:

一种稍微更优雅的方式.这个单线:

re.sub(rf"({'|'.join(changewords)}) \n", r"\1:", string, flags=re.I)

演示:

>>> string = "Foo \n value of something \n Bar \n Another value \n"
>>> changewords = ['Foo', 'Bar', 'Baz', 'qux']
>>> 
>>> re.sub(rf"({'|'.join(changewords)}) \n", r"\1:", string, flags=re.I)
'Foo: value of something \n Bar: Another value \n'
>>> 

您可以使用flags选项指定不区分大小写的匹配.替换字符串可以修改为在\1周围包含冒号或逗号等任何内容.

值得注意的是,您可以在Python中的字符串上放置多个说明符.例如,您可以同时使用rf,例如rf"my raw formatted string"-指定符的顺序并不重要.

re.sub(expr, repl, string)中的表达式中,您可以指定组.通过在文本周围加上括号()来组成组.

然后,可以使用反斜杠及其出现的次数在替换字符串repl中引用组,第一个组由\1引用.

re.sub()函数re.sub(rf"(A|B|C) \n", r"\1: ")将替换字符串中的\1与表达式参数中的第一组(A|B|C)相关联.

使用替换功能:

假设您要用词典中的其他单词替换目标字符串中的单词.例如,您希望将"Bar"替换为"Hank",将"Foo"替换为"Bernard".可以使用替换函数而不是替换字符串来完成此操作:

>>> repl_dict = {'Foo':'Bernard', 'Bar':'Hank'}
>>> 
>>> expr = rf"({'|'.join(repl_dict.keys())}) \n"   # Becomes '(Foo|Bar) \\n'
>>>
>>> func = lambda mo: f"{repl_dict[mo.group(1)]}:"
>>> 
>>> re.sub(expr, func, string, flags=re.I)
'Bernard: value of something \n Hank: Another value \n'
>>> 

这可能是另一种形式,但为了清楚起见,我将其分解了...

lambda函数的作用是将匹配对象mo传递给它,然后提取第一组文本. reg expr中的第一组是()所包含的文本,就像(A|B|C).

替换功能使用mo.group(1)引用该第一组;同样,在上一个示例中,替换字符串由\1引用.

然后repl函数在dict中进行查找,并返回匹配的最终替换字符串.

I have the following string and list 'changewords'. I would like to replace the '{word from list} \n' with '{word from list}:' I don't want to replace all instances of '\n'.

string = "Foo \n value of something \n Bar \n Another value \n"
changewords = ["Foo", "Bar"]

Desired Output:

'Foo: value of something \n Bar: Another value \n'

I have tried the following

for i in changewords:
    tem = re.sub(f'{i} \n', f'{i}:', string)
tem
Output: 'Foo \n value of something \n Bar: Another value \n'

and

changewords2 = '|'.join(changewords)
tem = re.sub(f'{changewords2} \n', f'{changewords2}:', string)
tem
Output: 'Foo|Bar: \n value of something \n Foo|Bar: Another value \n'

How can I get my desired output?

解决方案

Using replacement string:

A slightly more elegant way of doing it. This one-liner:

re.sub(rf"({'|'.join(changewords)}) \n", r"\1:", string, flags=re.I)

demo:

>>> string = "Foo \n value of something \n Bar \n Another value \n"
>>> changewords = ['Foo', 'Bar', 'Baz', 'qux']
>>> 
>>> re.sub(rf"({'|'.join(changewords)}) \n", r"\1:", string, flags=re.I)
'Foo: value of something \n Bar: Another value \n'
>>> 

You can specify case insensitive matching with the flags option. And the replacement string can be modified to have anything around \1 needed like colons or commas.

Worth noting, you can put more than one specifier on strings in Python. For instance you can have both r and f like, rf"my raw formatted string" - the order of specifiers isn't important.

Within the expression in re.sub(expr, repl, string), you can specify groups. Groups are made by placing parenthesis () around text.

Groups can then be referenced in the replacement string, repl, by using a backslash and the number of its occurrence - the first group is referred to by \1.

The re.sub() function, re.sub(rf"(A|B|C) \n", r"\1: "), associates \1 within the replacement string with the first group (A|B|C) within the expression argument.

Using replacement function:

Suppose you want to replace words in the target string with other words from a dictionary. For instance you want 'Bar' to be replaced with 'Hank' and 'Foo' with 'Bernard'. This can be done using a replacement function instead of replacement string:

>>> repl_dict = {'Foo':'Bernard', 'Bar':'Hank'}
>>> 
>>> expr = rf"({'|'.join(repl_dict.keys())}) \n"   # Becomes '(Foo|Bar) \\n'
>>>
>>> func = lambda mo: f"{repl_dict[mo.group(1)]}:"
>>> 
>>> re.sub(expr, func, string, flags=re.I)
'Bernard: value of something \n Hank: Another value \n'
>>> 

This could be another one-liner, but I broke it up for clarity...

What the lambda function does is take the match object, mo passed to it, then extract the first group's text. The first group in the reg expr is the text encompassed by (), which would be like (A|B|C).

The replacement function references this first group using, mo.group(1); similarly, the replacement string referenced it by, \1 in the previous example.

Then the repl function does the lookup in the dict and returns the final replacement string for the match.

这篇关于Python:使用re.sub替换列表中的多个特定单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆