Delphi RegEX库和Unicode字符 [英] Delphi RegEX library and unicode characters
问题描述
如果使用 \b
作为单词边界,则似乎只能理解ASCII字母
,例如模式
If one uses \b
for a word boundary, it seems it understands only ASCII alphabet
for example the pattern
\bM\b
将匹配 aaaa M bbbbbb
但如果我有
aaaaa Mädchen
它也会这样做,因为它认为ä
是字尾。
it will too, because it considers ä
to be an end of word.
此正则表达式库是否也设置了接受Unicode字符串的标志?
这个lib看起来不太原始,但它不在选项中
Are there any flags to set for this regexp lib to accept Unicode strings too? It seems very unlikely that this lib would be so primitive but it is not in the options
TRegExOption = (roNone, roIgnoreCase, roMultiLine, roExplicitCapture,
roCompiled, roSingleLine, roIgnorePatternSpace);
推荐答案
根据 regular-expressions.info ,Delphi regex lib基于PCRE和预定义字符类 \w < PCRE中的/ code>仅基于ASCII,因此
\b
也仅基于ASCII。
According to regular-expressions.info, Delphi regex lib is based on PCRE and the predefined character class \w
in PCRE is only ASCII based, therefore \b
is also only ASCII based.
这篇关于Delphi RegEX库和Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!