Python series.str.contains框架中正则表达式内的变量 [英] Variable inside regular expression in Python's series.str.contains framework

查看:109
本文介绍了Python series.str.contains框架中正则表达式内的变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在运行正则表达式之前将正则表达式的元素作为变量进行控制/编辑.在我正在使用的正则表达式中,我想在数据框中查找包含2个单词,最多3个单词分隔的行.

I want to control/edit elements of a regex as variables before running the regex. In the regex I am using, I want to find the rows in a data frame containing 2 words separated by a maximum of 3 words.

此代码使用不带外部变量的正则表达式来标识word1和word2:

This code identifies word1 and word2, using the regex without outside variables:

import re
import pandas as pd

df = pd.DataFrame({'a': ['some text here', 'some text there', 'word1 some more text word2']})
result = df['a'].str.contains(r"\b(?:word1\W+(?:\w+\W+){0,3}?word2|word2\W+(?:\w+\W+){0,3}?word1)\b") 

print(result)
0    False
1    False
2    True
Name: a, dtype: bool

我想要达到相同的结果,但能够在正则表达式之外控制word1,word2和值3.

What I want is to reach the same result but being able to control word1, word2 and the value 3 outside the regex.

这是我尝试在正则表达式之外定义变量的尝试,它根据此处对stackoverflow上类似问题的回答进行了调整:

Here is my failed attempt to define variables outside the regex, adapting from answers to similar questions here on stackoverflow:

import re
import pandas as pd

Var1 = "word1"
Var2 = "word2"
Var3 = "3"


df = pd.DataFrame({'a': ['some text here', 'some text there', 'word1 some more text word2']})
result = df['a'].str.contains(r"\b(?:{Var1}\W+(?:\w+\W+){0,{Var3}}?{Var2}|{Var2}\W+(?:\w+\W+){0,{Var3}}?{Var1})\b") 
   
print(result)
0    False
1    False
2    False
Name: a, dtype: bool

类似地,这也失败了:

result = df['a'].str.contains(r"\b(?:"+Var1+"\W+(?:\w+\W+){0,"+Var3+"}?"+Var2+"|"+Var2+"\W+(?:\w+\W+){0,"+Var3+"}?"+Var1+")\b")    

有没有一种简单的方法可以使正则表达式适应读取Var1 2和3?

Is there a simple way to adapt the regex to read Var1 2 and 3?

推荐答案

您可以将原始字符串与 f-strings (

You can combine your raw string with f-strings (New in version 3.6), but first you have to escape the curly braces on regex quantifiers.

大括号外的字符串部分将按字面进行处理,除了将任何双大括号'{{'或'}}'替换为相应的单个大括号之外.单个大括号'{'标记了一个替换字段,该字段以Python表达式开头...

The parts of the string outside curly braces are treated literally, except that any doubled curly braces '{{' or '}}' are replaced with the corresponding single curly brace. A single opening curly bracket '{' marks a replacement field, which starts with a Python expression...

rf"\b(?:{Var1}\W+(?:\w+\W+){{0,{Var3}}}?{Var2}|{Var2}\W+(?:\w+\W+){{0,{Var3}}}?{Var1})\b"

这篇关于Python series.str.contains框架中正则表达式内的变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆