如何从正则表达式子模式中排除单词? [英] How to exclude a word from regex subpattern?
问题描述
我正在使用Delphi 7和TDIPerlRegEx。我在句子的某些部分中寻找动词,其中包含一些特定的部分来标识动词。
I am using Delphi 7 and TDIPerlRegEx. I am looking for verbs in parts of sentence which contain some specific part to identify the verb.
s1 := '(I|you|he|she|it|we|they|this|that|these|those)';
s2 := (can|should|would|could|must|want to|have to|had to|might);
RegEx_Seek_1.MatchPattern := '(*UCP)(?m) \b'+s1+'\b \b'+s2+'\b \K([^ß\W]\w{2,15})\b';
错误地包含在结果中的关键字是 not;但应排除在外:
The key word which is wrongly included in result is "not"; but should be exluded:
示例文本:
...您不应 每个...的优势
... that you should not ßeat of every ...
结果中应包含以下动词:
示例文本:
Verb like this should be included in result: Sample text:
不要让他伸出他的手...
lest he should put forth his hand ...
现在,我将用ß符号解释该部分。 ß符号表示原始文本带有 not字样,然后是动词。但是我在先前的交互或会话中更改了此文本,因此我现在正在使用的源文本如上所述。模式([^ ß\W] \w {2,15})
应该避免使用否定的单词。这也是为什么不包括否定动词的原因。
Now I would explain the part with ß sign. The ß sign says, that the original text had "not" word, and then the verb is followed. But I changed this text in previous interaction or session so the source text which I am working now is as stated above. The pattern ([^ß\W]\w{2,15})
should avoid the word which is used in negative sense. This is also why do not include the "negative" verb.
所以问题的关键是如何从捕获的文本中排除 not一词; -由此模式捕获,该模式为([^ ß\W] \w {2,15})
或(\ \W {3,15})
。
So point of the question is how to exclude the "not" word from the captured text; that is - captured by this pattern, which is either ([^ß\W]\w{2,15})
or (\W{3,15})
.
我正在使用此模式替换文本中的子字符串。
I am using this pattern to replace substrings in text.
需要更多示例文本吗?
bear 。
所以我可能带走了她的
他们可能会住在一起
他们不可能不能一起住
说,
在第3组中,我希望匹配:
for 熊
,已摄取
(或者可能是 have
而不是已摄取),暂停
和说
。
我试图排除 not
单词,因此必须将 not
之后的任何动词或单词排除在外第三组或完全比赛。我只对第3组感兴趣。组1和2仅指定动词之前的替代项。
In group 3 I expect match:
for bear
, taken
(or posibly have
instead of taken), dwell
and say
.
I am trying to exclude the not
word, so any verb or word following not
must be excluded from 3rd group or the match completely. I am interested about group 3 only. Group 1 and 2 just specifies alternatives preceding the verb.
推荐答案
您可以使用分支重置组以匹配空字符串,如果整个 情态动词后面的词,否则为概念动词:
You may use a branch reset group to match an empty string if there is not
as a whole word after a modal verb, or a notional verb otherwise:
\b(I|you|he|she|it|we|they|this|that|these|those)\s+(can|should|would|could|must|want to|have to|had to|might)\s+\K(?|(?=not\b)()|([^ß\W]\w{2,15})\b)
请参见 regex演示
详细信息
-
\b
-单词边界 -
(我|您|他|她|它|我们|他们|这个|那个|那些|那些)
-组1中的代词之一 -
\s +
-1+空格(它已经作为相邻两边的单词边界t个组) -
(可以|应该|将|必须|想要|必须|必须|可能)
-情态动词之一 -
\s +
-1+空格 -
\K
-匹配重置运算符 -
(?|(?= not\b)()|([^ ß\W] \w {2,15})\b)
-与
- 匹配的分支重置组
(?= not\b)()
-如果在整个单词的右边紧接有not
,请捕获第3组中的空字符串 -
|
-或(在此为其他) -
([[^ ß\W] \w {2,15})\b
-将以外的任何其他字符char匹配并捕获到组3中ß
,然后是2到15个带有字符边界的单词字符。
\b
- a word boundary(I|you|he|she|it|we|they|this|that|these|those)
- one of the pronouns in the group 1\s+
- 1+ whitespaces (it is already acting as a word boundary on both sides of the adjacent groups)(can|should|would|could|must|want to|have to|had to|might)
- one ofthe modal verbs\s+
- 1+ whitespaces\K
- match reset operator(?|(?=not\b)()|([^ß\W]\w{2,15})\b)
- the branch reset group matching either(?=not\b)()
- if there isnot
as whole word immediately to the right, capture an empty string into Group 3|
- or (here, else)([^ß\W]\w{2,15})\b
- match and capture into Group 3 any word char other thanß
and then 2 to 15 word chars with a word boundary to follow.
请注意,
(?m)
-PCRE_MULTILINE
-仅在您需要<字符类之外的code> ^ 和$
匹配行的开始和结束而不是整个字符串。由于您的模式没有此类锚点,因此(?m)
是多余的。Note that
(?m)
-PCRE_MULTILINE
- is only necessary if you want your^
and$
outside of character classes match start and end of lines rather than the whole string. Since your pattern has no such anchors,(?m)
is redundant.这篇关于如何从正则表达式子模式中排除单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 匹配的分支重置组