如何从正则表达式子模式中排除单词? [英] How to exclude a word from regex subpattern?

查看:143
本文介绍了如何从正则表达式子模式中排除单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Delphi 7和TDIPerlRegEx。我在句子的某些部分中寻找动词,其中包含一些特定的部分来标识动词。

I am using Delphi 7 and TDIPerlRegEx. I am looking for verbs in parts of sentence which contain some specific part to identify the verb.

s1 := '(I|you|he|she|it|we|they|this|that|these|those)';
s2 := (can|should|would|could|must|want to|have to|had to|might);
RegEx_Seek_1.MatchPattern := '(*UCP)(?m) \b'+s1+'\b \b'+s2+'\b \K([^ß\W]\w{2,15})\b';

错误地包含在结果中的关键字是 not;但应排除在外:

The key word which is wrongly included in result is "not"; but should be exluded:

示例文本:

...您不应 每个...的优势

... that you should not ßeat of every ...

结果中应包含以下动词:
示例文本:

Verb like this should be included in result: Sample text:

不要让他伸出他的手...

lest he should put forth his hand ...

现在,我将用ß符号解释该部分。 ß符号表示原始文本带有 not字样,然后是动词。但是我在先前的交互或会话中更改了此文本,因此我现在正在使用的源文本如上所述。模式([^ ß\W] \w {2,15})应该避免使用否定的单词。这也是为什么不包括否定动词的原因。

Now I would explain the part with ß sign. The ß sign says, that the original text had "not" word, and then the verb is followed. But I changed this text in previous interaction or session so the source text which I am working now is as stated above. The pattern ([^ß\W]\w{2,15}) should avoid the word which is used in negative sense. This is also why do not include the "negative" verb.

所以问题的关键是如何从捕获的文本中排除 not一词; -由此模式捕获,该模式为([^ ß\W] \w {2,15})(\ \W {3,15})

So point of the question is how to exclude the "not" word from the captured text; that is - captured by this pattern, which is either ([^ß\W]\w{2,15}) or (\W{3,15}) .

我正在使用此模式替换文本中的子字符串。

I am using this pattern to replace substrings in text.

需要更多示例文本吗?


bear

所以我可能带走了她的

他们可能会在一起

他们不可能不能一起住

在第3组中,我希望匹配:
for 已摄取(或者可能是 have 而不是已摄取),暂停
我试图排除 not 单词,因此必须将 not 之后的任何动词或单词排除在外第三组或完全比赛。我只对第3组感兴趣。组1和2仅指定动词之前的替代项。

In group 3 I expect match: for bear, taken (or posibly have instead of taken), dwell and say. I am trying to exclude the not word, so any verb or word following not must be excluded from 3rd group or the match completely. I am interested about group 3 only. Group 1 and 2 just specifies alternatives preceding the verb.

推荐答案

您可以使用分支重置组以匹配空字符串,如果整个 情态动词后面的词,否则为概念动词:

You may use a branch reset group to match an empty string if there is not as a whole word after a modal verb, or a notional verb otherwise:

\b(I|you|he|she|it|we|they|this|that|these|those)\s+(can|should|would|could|must|want to|have to|had to|might)\s+\K(?|(?=not\b)()|([^ß\W]\w{2,15})\b)

请参见 regex演示

详细信息


  • \b -单词边界

  • (我|您|他|她|它|我们|他们|这个|那个|那些|那些)-组1中的代词之一

  • \s + -1+空格(它已经作为相邻两边的单词边界t个组)

  • (可以|应该|将|必须|想要|必须|必须|可能)-情态动词之一

  • \s + -1+空格

  • \K -匹配重置运算符

  • (?|(?= not\b)()|([^ ß\W] \w {2,15})\b)-与


    • 匹配的分支重置组(?= not\b)()-如果在整个单词的右边紧接有 not ,请捕获第3组中的空字符串

    • | -或(在此为其他)

    • ([[^ ß\W] \w {2,15})\b -将以外的任何其他字符char匹配并捕获到组3中ß,然后是2到15个带有字符边界的单词字符。

    • \b - a word boundary
    • (I|you|he|she|it|we|they|this|that|these|those) - one of the pronouns in the group 1
    • \s+ - 1+ whitespaces (it is already acting as a word boundary on both sides of the adjacent groups)
    • (can|should|would|could|must|want to|have to|had to|might) - one ofthe modal verbs
    • \s+ - 1+ whitespaces
    • \K - match reset operator
    • (?|(?=not\b)()|([^ß\W]\w{2,15})\b) - the branch reset group matching either
      • (?=not\b)() - if there is not as whole word immediately to the right, capture an empty string into Group 3
      • | - or (here, else)
      • ([^ß\W]\w{2,15})\b - match and capture into Group 3 any word char other than ß and then 2 to 15 word chars with a word boundary to follow.

      请注意,(?m)- PCRE_MULTILINE -仅在您需要<字符类之外的code> ^ $ 匹配的开始和结束而不是整个字符串。由于您的模式没有此类锚点,因此(?m)是多余的。

      Note that (?m) - PCRE_MULTILINE - is only necessary if you want your ^ and $ outside of character classes match start and end of lines rather than the whole string. Since your pattern has no such anchors, (?m) is redundant.

      这篇关于如何从正则表达式子模式中排除单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆