您如何指定可用于除英语之外的其他欧洲语言的正则表达式字符范围? [英] How do you specify a regex character range that will work in European languages other than English?
问题描述
我正在使用Ruby的regex引擎.我需要编写一个执行此操作的正则表达式
I'm working with Ruby's regex engine. I need to write a regex that does this
WIKI_WORD = /\b([a-z][\w_]+\.)?[A-Z][a-z]+[A-Z]\w*\b/
,但除英语外,还可以使用其他欧洲语言.我认为[a-z]字符范围不会覆盖德语等小写字母.
but will also work in other European languages besides English. I don't think that the character range [a-z] will cover lowercase letters in German, etc.
推荐答案
WIKI_WORD = /\b(\p{Ll}\w+\.)?\p{Lu}\p{Ll}+\p{Lu}\w*\b/u
应该在Ruby 1.9中工作. \p{Lu}
和\p{Ll}
是大写和小写Unicode字母的简写. (\w
已包含下划线)
should work in Ruby 1.9. \p{Lu}
and \p{Ll}
are shorthands for uppercase and lowercase Unicode letters. (\w
already includes the underscore)
另请参见此答案-您可能需要运行Ruby在UTF-8模式下可以正常工作,并且您的脚本也必须也以UTF-8编码.
See also this answer - you might need to run Ruby in UTF-8 mode for this to work, and possibly your script must be encoded in UTF-8, too.
这篇关于您如何指定可用于除英语之外的其他欧洲语言的正则表达式字符范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!