ruby 中的/\p{Alpha}/i 和/\p{L}/i 有什么区别? [英] What's the difference between /\p{Alpha}/i and /\p{L}/i in ruby?
问题描述
我正在尝试在 ruby 中构建一个正则表达式来匹配 UTF-8 中的字母字符,例如 ñíóúü
等.我知道 /\p{Alpha}/i
工作和 /\p{L}/i
也工作,但有什么区别?
I'm trying to build a regexp in ruby to match alpha characters in UTF-8 like ñíóúü
, etc. I know /\p{Alpha}/i
works and /\p{L}/i
works too but what's the difference?
推荐答案
它们似乎是等效的.(有时,请参阅此答案的结尾)
They seem to be equivalent. ( sometimes, see the end of this answer)
似乎 Ruby 从 1.9 版开始就支持 \p{Alpha}
.在 POSIX 中 \p{Alpha}
等于 \p{L&}
(对于支持 Unicode 的正则表达式;见这里).这匹配所有具有大写和小写变体的字符(参见此处).不匹配大写字母(而它们将通过 \p{L}
匹配.
It seems like Ruby supports \p{Alpha}
since version 1.9. In POSIX \p{Alpha}
is equal to \p{L&}
(for regular expressions with Unicode support; see here). This matches all characters that have an upper and lower case variant (see here). Unicase letters would not be matched (while they would be match by \p{L}
.
这似乎不适用于 Ruby(我随机选择了一个阿拉伯语字符,因为阿拉伯语有一个unicase 字母表):
This does not seem to be true for Ruby (I picked a random Arabic character, since Arabic has a unicase alphabet):
\p{L}
(any letter) matches.- Case-sensitive classes
\p{Lu}
,\p{Ll}
,\p{Lt}
don't match. As expected. p{L&}
doesn't match. As expected.\p{Alpha}
matches.
这似乎很好地表明 \p{Alpha}
只是 Ruby 中 \p{L}
的别名.在 Rubular 上,您还可以看到 \p{Alpha}
在 Ruby 1.8.7 中不可用.
Which seems to be a very good indication that \p{Alpha}
is just an alias for \p{L}
in Ruby. On Rubular you can also see that \p{Alpha}
was not available in Ruby 1.8.7.
注意 i
修饰符在任何情况下都无关紧要,因为 \p{Alpha}
和 \p{L}
都匹配大写和小写字符.
Note that the i
modifier is irrelevant in any case, because both \p{Alpha}
and \p{L}
match both upper- and lower-case characters anyway.
啊哈,有区别!我刚刚找到了 这个 PDF 关于 Ruby 的新正则表达式引擎(从 Ruby 1.9 开始使用)如上所述).\p{Alpha}
无论编码如何都可用(如果不支持 Unicode,则可能只匹配 [A-Za-z]
),而 \p{L}
特别是一个 Unicode 属性.这意味着,\p{Alpha}
的行为与 POSIX 正则表达式完全相同,不同之处在于这里对应于 \p{L}
,但在 POSIX 中它对应于 <代码>\p{L&}.
A ha, there is a difference! I just found this PDF about Ruby's new regex engine (in use as of Ruby 1.9 as stated above). \p{Alpha}
is available regardless of encoding (and will probably just match [A-Za-z]
if there is no Unicode support), while \p{L}
is specifically a Unicode property. That means, \p{Alpha}
behaves exactly as in POSIX regexes, with the difference that here is corresponds to \p{L}
, but in POSIX it corresponds to \p{L&}
.
这篇关于ruby 中的/\p{Alpha}/i 和/\p{L}/i 有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!