如何从正则表达式子模式中排除单词？ [英] How to exclude a word from regex subpattern?

查看：143 发布时间：2020/10/20 7:03:50 regex delphi

本文介绍了如何从正则表达式子模式中排除单词？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Delphi 7和TDIPerlRegEx。我在句子的某些部分中寻找动词，其中包含一些特定的部分来标识动词。

I am using Delphi 7 and TDIPerlRegEx. I am looking for verbs in parts of sentence which contain some specific part to identify the verb.

s1 := '(I|you|he|she|it|we|they|this|that|these|those)';
s2 := (can|should|would|could|must|want to|have to|had to|might);
RegEx_Seek_1.MatchPattern := '(*UCP)(?m) \b'+s1+'\b \b'+s2+'\b \K([^ß\W]\w{2,15})\b';

错误地包含在结果中的关键字是 not；但应排除在外：

The key word which is wrongly included in result is "not"; but should be exluded:

示例文本：

...您不应每个...的优势

... that you should not ßeat of every ...

结果中应包含以下动词：
示例文本：

Verb like this should be included in result: Sample text:

不要让他伸出他的手...

lest he should put forth his hand ...

现在，我将用ß符号解释该部分。 ß符号表示原始文本带有 not字样，然后是动词。但是我在先前的交互或会话中更改了此文本，因此我现在正在使用的源文本如上所述。模式（[^ ß\W] \w {2,15}）应该避免使用否定的单词。这也是为什么不包括否定动词的原因。

Now I would explain the part with ß sign. The ß sign says, that the original text had "not" word, and then the verb is followed. But I changed this text in previous interaction or session so the source text which I am working now is as stated above. The pattern ([^ß\W]\w{2,15}) should avoid the word which is used in negative sense. This is also why do not include the "negative" verb.

所以问题的关键是如何从捕获的文本中排除 not一词； -由此模式捕获，该模式为（[^ ß\W] \w {2,15}）或（＼＼W {3,15}）。

So point of the question is how to exclude the "not" word from the captured text; that is - captured by this pattern, which is either ([^ß\W]\w{2,15}) or (\W{3,15}) .

我正在使用此模式替换文本中的子字符串。

I am using this pattern to replace substrings in text.

需要更多示例文本吗？

bear 。

所以我可能带走了她的

他们可能会住在一起

他们不可能不能一起住

说，

在第3组中，我希望匹配：
for 熊，已摄取（或者可能是 have 而不是已摄取），暂停和说。
我试图排除 not 单词，因此必须将 not 之后的任何动词或单词排除在外第三组或完全比赛。我只对第3组感兴趣。组1和2仅指定动词之前的替代项。

In group 3 I expect match: for bear, taken (or posibly have instead of taken), dwell and say. I am trying to exclude the not word, so any verb or word following not must be excluded from 3rd group or the match completely. I am interested about group 3 only. Group 1 and 2 just specifies alternatives preceding the verb.

推荐答案

您可以使用分支重置组以匹配空字符串，如果整个 情态动词后面的词，否则为概念动词：


You may use a branch reset group to match an empty string if there is not as a whole word after a modal verb, or a notional verb otherwise:
\b(I|you|he|she|it|we|they|this|that|these|those)\s+(can|should|would|could|must|want to|have to|had to|might)\s+\K(?|(?=not\b)()|([^ß\W]\w{2,15})\b)

请参见 regex演示 
 详细信息

 
   \b -单词边界
 
  （我|您|他|她|它|我们|他们|这个|那个|那些|那些）-组1中的代词之一
 
   \s + -1+空格（它已经作为相邻两边的单词边界t个组）
 
  （可以|应该|将|必须|想要|必须|必须|可能）-情态动词之一
 
   \s + -1+空格
 
   \K -匹配重置运算符

（？|（？= not\b）（）|（[^ ß\W] \w {2,15}）\b）-与匹配的分支重置组（？= not\b）（）-如果在整个单词的右边紧接有 not ，请捕获第3组中的空字符串 | -或（在此为其他）（[[^ ß\W] \w {2,15}）\b -将以外的任何其他字符char匹配并捕获到组3中ß，然后是2到15个带有字符边界的单词字符。
- (?|(?=not\b)()|([^ß\W]\w{2,15})\b) - the branch reset group matching either (?=not\b)() - if there is not as whole word immediately to the right, capture an empty string into Group 3 | - or (here, else) ([^ß\W]\w{2,15})\b - match and capture into Group 3 any word char other than ß and then 2 to 15 word chars with a word boundary to follow.
  请注意，（？m）- PCRE_MULTILINE -仅在您需要<字符类之外的code> ^ 和 $ 匹配行的开始和结束而不是整个字符串。由于您的模式没有此类锚点，因此（？m）是多余的。
  
  Note that (?m) - PCRE_MULTILINE - is only necessary if you want your ^ and $ outside of character classes match start and end of lines rather than the whole string. Since your pattern has no such anchors, (?m) is redundant.
  
  这篇关于如何从正则表达式子模式中排除单词？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从正则表达式子模式中排除单词？ [英] How to exclude a word from regex subpattern?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何从正则表达式子模式中排除单词？ [英] How to exclude a word from regex subpattern?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭