Delphi RegEX库和Unicode字符 [英] Delphi RegEX library and unicode characters

查看:55
本文介绍了Delphi RegEX库和Unicode字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果使用 \b 作为单词边界,则似乎只能理解ASCII字母
,例如模式

If one uses \b for a word boundary, it seems it understands only ASCII alphabet for example the pattern

\bM\b 将匹配 aaaa M bbbbbb

但如果我有

aaaaa Mädchen 

它也会这样做,因为它认为ä是字尾。

it will too, because it considers ä to be an end of word.

此正则表达式库是否也设置了接受Unicode字符串的标志?
这个lib看起来不太原始,但它不在选项中

Are there any flags to set for this regexp lib to accept Unicode strings too? It seems very unlikely that this lib would be so primitive but it is not in the options

TRegExOption = (roNone, roIgnoreCase, roMultiLine, roExplicitCapture,
roCompiled, roSingleLine, roIgnorePatternSpace);


推荐答案

根据 regular-expressions.info ,Delphi regex lib基于PCRE和预定义字符类 \w < PCRE中的/ code>仅基于ASCII,因此 \b 也仅基于ASCII。

According to regular-expressions.info, Delphi regex lib is based on PCRE and the predefined character class \w in PCRE is only ASCII based, therefore \b is also only ASCII based.

这篇关于Delphi RegEX库和Unicode字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆