正则表达式查找以〜开头和结尾的所有可能出现的文本 [英] Regex to find all possible occurrences of text starting and ending with ~

查看:506
本文介绍了正则表达式查找以〜开头和结尾的所有可能出现的文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在两个之间找到所有可能出现的文字。

I would like to find all possible occurrences of text enclosed between two ~s.

例如:对于文本〜* _abc~xyz~~ 123~ ,我希望以下表达式作为匹配模式:

For example: For the text ~*_abc~xyz~ ~123~, I want the following expressions as matching patterns:


  1. 〜* _abc~

  2. ~xyz~

  3. ~123~

  1. ~*_abc~
  2. ~xyz~
  3. ~123~

注意它可以是字母或数字。

Note it can be an alphabet or a digit.

我尝试使用正则表达式〜[\ w] +?〜但它没有给我 ~xyz~ 。我想重新考虑。但我不希望只有 ~~ 作为可能的匹配。

I tried with the regex ~[\w]+?~ but it is not giving me ~xyz~. I want ~ to be reconsidered. But I don't want just ~~ as a possible match.

推荐答案

使用 捕捉正面的前瞻 以下正则表达式:

Use capturing inside a positive lookahead with the following regex:


有时,您需要在同一个单词中进行多次匹配。例如,假设从 ABCD 这样的字符串中提取 ABCD BCD CD D 。您可以使用此单一正则表达式执行此操作:

Sometimes, you need several matches within the same word. For instance, suppose that from a string such as ABCD you want to extract ABCD, BCD, CD and D. You can do it with this single regex:

(?=(\ w +))

在字符串的第一个位置(在 A 之前),引擎启动第一次匹配尝试。前瞻断言紧接在当前位置之后的是一个或多个单词字符,并将这些字符捕获到组1.前瞻成功,匹配尝试也是如此。由于模式与任何实际字符都不匹配(前瞻仅查看),引擎返回零宽度匹配(空字符串)。它还返回第1组捕获的内容: ABCD

At the first position in the string (before the A), the engine starts the first match attempt. The lookahead asserts that what immediately follows the current position is one or more word characters, and captures these characters to Group 1. The lookahead succeeds, and so does the match attempt. Since the pattern didn't match any actual characters (the lookahead only looks), the engine returns a zero-width match (the empty string). It also returns what was captured by Group 1: ABCD

然后引擎移动到字符串中的下一个位置并开始下一场比赛尝试。同样,前瞻断言紧接在该位置之后的是单词字符,并将这些字符捕获到组1.匹配成功,组1包含 BCD

The engine then moves to the next position in the string and starts the next match attempt. Again, the lookahead asserts that what immediately follows that position is word characters, and captures these characters to Group 1. The match succeeds, and Group 1 contains BCD.

引擎移动到字符串中的下一个位置,并且该过程重复为 CD 然后 d

The engine moves to the next position in the string, and the process repeats itself for CD then D.

所以,使用

(?=(~[^\s~]+~))

参见正则表达式演示

模式(?=(〜[^ \s~] +〜))检查字符串中的每个位置并搜索后跟1 +除了空格和之外的字符,然后是另一个。由于仅在检查位置后移动索引,而不是在捕获时移动索引,因此会提取重叠的子串。

The pattern (?=(~[^\s~]+~)) checks each position inside a string and searches for ~ followed with 1+ characters other than whitespace and ~ and then followed with another ~. Since the index is moved only after a position is checked, and not when the value is captured, overlapping substrings get extracted.

Python演示

import re
p = re.compile(r'(?=(~[^\s~]+~))')
test_str = " ~*_abc~xyz~ ~123~"
print(p.findall(test_str))
# => ['~*_abc~', '~xyz~', '~123~']

这篇关于正则表达式查找以〜开头和结尾的所有可能出现的文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆