如何使用 '?'在python 中提取两个匹配模式之间的可选子字符串? [英] How to use '?' to extract optional substring between two matching pattern in python?

查看:33
本文介绍了如何使用 '?'在python 中提取两个匹配模式之间的可选子字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在回答这个问题.考虑这个字符串

I was answering this question. Consider this string

str1 = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'

并且假设我想提取每个子字符串 from group 之后的数字以及 \\t 之后具有最小匹配字符串的子字符串.

and suppose I want to extract the number after each substring from group and the substring after \\t having the smallest matching string.

我用下面的正则表达式做到了这一点

I did this with the following regular expression

import re
res = re.findall(r'from group (\d+).*?\\t(.*? ALL-..)', str1)

输出为:

[('17', 'Allow ALL-00'), ('18', 'No Allow ALL-00'), ('20', 'Check ALL-00')]

现在在我提取的每个子字符串之间(\t 之后的数字和子字符串)可能有一个可选的子字符串,其值为 Temp 我想提取(如果存在).例如,在 18No Allow ALL-00 之间有我想提取的子字符串 Temp.

Now between each substring I am extracting (the number and the substring after \t) there might be an optional substring, whose value is Temp I want to extract (if present). For example between 18 and No Allow ALL-00 there is the substring Temp I would like to extract.

我尝试使用 ? 如下:

res = re.findall(r'from group (\d+).*?(Temp)?.*?\\t(.*? ALL-..)', str1)

但结果元组对应的第二个元素总是空的:

but the corresponding second element of the resulting tuples is always empty:

[('17', '', 'Allow ALL-00'), ('18', '', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]

虽然我期待这样的事情:

while I was expecting something like:

[('17', '', 'Allow ALL-00'), ('18', 'Temp', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]

在这种情况下如何提取子串?我做错了什么?

How to extract substrings in this case? What is the mistake I am doing?

另一个问题:假设我希望我的结果列表没有这个元素(包含 Temp 的那个):我应该只使用 [^] 然后相应的匹配模式?

One further question: suppose I want my resulting list not having this element (the one containing Temp): should I just use [^] and then the corresponding matching pattern?

推荐答案

不捕获 Temp 的原因是因为您将其设为可选的 .*?code> 使用它,并且 Temp 不会在您的可选组中被捕获.

The reason why it is not capturing Temp is because you have made it optional due to which .*? consumes it, and Temp does not get captured in your optional group.

为了解决这个问题,你可以使用负前瞻来拒绝 Temp 被捕获,除了使用这个正则表达式的任何其他字符,

To solve that problem, you can use negative look ahead to reject Temp getting captured except any other character using this regex,

from group (\d+)(?:(?!Temp).)*?(Temp)?(?:(?!Temp).)*?\\t(.*? ALL-..)
                   ^^^^^^^^^ This rejects Temp getting captured except any other character

正则表达式解释:

  • from group - 此文本的字面匹配
  • (?:(?!Temp).)*? - ?: 表示它是一个非捕获组,默认情况下是一个捕获组,这意味着当您看到 Temp 字符串和 * 表示捕获零个或多个字符时,捕获任何内容但停止.所以这会捕获任何不包含 Temp? 的字符串,意味着尽可能少
  • (Temp)? - 可选地捕获 Temp(如果存在)
  • (?:(?!Temp).)*? - 再次捕获任何字符零次或多次,除了 Temp 就像上面一样
  • \\t - 从字面上捕捉这个
  • (.*?ALL-..) - 尽可能少地捕获任何字符,后跟一个空格,后跟文字 ALL- 后跟任意两个字符
  • from group - literal matching of this text
  • (?:(?!Temp).)*? - ?: means its a non-capturing group which by default is a capturing group and this means that capturing anything but stop when you see Temp string and * means capture zero or more characters. So this captures any string which doesn't contain Temp and ? means as less as possible
  • (Temp)? - Optionally capture Temp if present
  • (?:(?!Temp).)*? - Again capture any character zero or more times except Temp just like above
  • \\t - capture this literally
  • (.*? ALL-..) - Capturing any character as less as possible followed by a space followed by literal ALL- followed by any two characters

希望这能澄清正则表达式.如果您有任何进一步的疑问,请告诉我.

Hope this clarifies the regex. Let me know in case you have any further queries.

演示

示例 Python 代码,

Sample Python Codes,

import re

s = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'

arr = re.findall(r'from group (\d+)(?:(?!Temp).)*?(Temp)?(?:(?!Temp).)*?\\t(.*? ALL-..)',s)
print(arr)

打印,

[('17', '', 'Allow ALL-00'), ('18', 'Temp', 'No Allow ALL-00'), ('20', '', 'Check ALL-00')]

仅列出不包含 Temp

您将需要使用此正则表达式来避免匹配包含Temp 字符串的子字符串,

You will need to use this regex to avoid matching substring that contains Temp string within the match,

from group (\d+)(?:(?!Temp).)*\\t(.*? ALL-..)

演示

示例 Python 代码,

Sample Python code,

import re

str1 = '{"show permission allowed to 16": "show permission to 16\\nSchool permissions from group 17:student to group 16:teacher:\\n\\tAllow ALL-00\\nSchool permissions from group 18:library to group 16(Temp):teacher:\\n\\tNo Allow ALL-00\\nSchool permissions from group 20:Gym to group 16:teacher:\\n\\tCheck ALL-00\\nRTYAHY: FALSE\\nRTYAHY: FALSE\\n\\n#"}'

arr = re.findall(r'from group (\d+)(?:(?!Temp).)*\\t(.*? ALL-..)',str1)
print(arr)

打印,

[('17', 'Allow ALL-00'), ('20', 'Check ALL-00')]

不包含具有 Temp

这篇关于如何使用 '?'在python 中提取两个匹配模式之间的可选子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆