Python series.str.contains框架中正则表达式内的变量 [英] Variable inside regular expression in Python's series.str.contains framework
问题描述
我想在运行正则表达式之前将正则表达式的元素作为变量进行控制/编辑.在我正在使用的正则表达式中,我想在数据框中查找包含2个单词,最多3个单词分隔的行.
I want to control/edit elements of a regex as variables before running the regex. In the regex I am using, I want to find the rows in a data frame containing 2 words separated by a maximum of 3 words.
此代码使用不带外部变量的正则表达式来标识word1和word2:
This code identifies word1 and word2, using the regex without outside variables:
import re
import pandas as pd
df = pd.DataFrame({'a': ['some text here', 'some text there', 'word1 some more text word2']})
result = df['a'].str.contains(r"\b(?:word1\W+(?:\w+\W+){0,3}?word2|word2\W+(?:\w+\W+){0,3}?word1)\b")
print(result)
0 False
1 False
2 True
Name: a, dtype: bool
我想要达到相同的结果,但能够在正则表达式之外控制word1,word2和值3.
What I want is to reach the same result but being able to control word1, word2 and the value 3 outside the regex.
这是我尝试在正则表达式之外定义变量的尝试,它根据此处对stackoverflow上类似问题的回答进行了调整:
Here is my failed attempt to define variables outside the regex, adapting from answers to similar questions here on stackoverflow:
import re
import pandas as pd
Var1 = "word1"
Var2 = "word2"
Var3 = "3"
df = pd.DataFrame({'a': ['some text here', 'some text there', 'word1 some more text word2']})
result = df['a'].str.contains(r"\b(?:{Var1}\W+(?:\w+\W+){0,{Var3}}?{Var2}|{Var2}\W+(?:\w+\W+){0,{Var3}}?{Var1})\b")
print(result)
0 False
1 False
2 False
Name: a, dtype: bool
类似地,这也失败了:
result = df['a'].str.contains(r"\b(?:"+Var1+"\W+(?:\w+\W+){0,"+Var3+"}?"+Var2+"|"+Var2+"\W+(?:\w+\W+){0,"+Var3+"}?"+Var1+")\b")
有没有一种简单的方法可以使正则表达式适应读取Var1 2和3?
Is there a simple way to adapt the regex to read Var1 2 and 3?
推荐答案
You can combine your raw string with f-strings
(New in version 3.6), but first you have to escape the curly braces on regex quantifiers.
大括号外的字符串部分将按字面进行处理,除了将任何双大括号'{{'或'}}'替换为相应的单个大括号之外.单个大括号'{'标记了一个替换字段,该字段以Python表达式开头...
The parts of the string outside curly braces are treated literally, except that any doubled curly braces '{{' or '}}' are replaced with the corresponding single curly brace. A single opening curly bracket '{' marks a replacement field, which starts with a Python expression...
rf"\b(?:{Var1}\W+(?:\w+\W+){{0,{Var3}}}?{Var2}|{Var2}\W+(?:\w+\W+){{0,{Var3}}}?{Var1})\b"
这篇关于Python series.str.contains框架中正则表达式内的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!