如何在ruby中为utf8使用正则表达式 [英] How to use regex for utf8 in ruby
问题描述
在RoR中,如何使用utf8代码验证发布表格的中文或日语单词.
In RoR,how to validate a Chinese or a Japanese word for a posting form with utf8 code.
在GBK代码中,它使用[\ u4e00- \ u9fa5] +来验证中文单词. 在Php中,它对utf-8页面使用/^ [\ x {4e00}-\ x {9fa5}] + $/u.
In GBK code, it uses [\u4e00-\u9fa5]+ to validate Chinese words. In Php, it uses /^[\x{4e00}-\x{9fa5}]+$/u for utf-8 pages.
推荐答案
Ruby 1.8对UTF-8字符串的支持不佳.您需要在正则表达式中单独写入字节,而不是完整的代码:
Ruby 1.8 has poor support for UTF-8 strings. You need to write the bytes individually in the regular expression, rather then the full code:
>> "acentuação".scan(/\xC3\xA7/)
=> ["ç"]
要匹配您指定的范围,表达式将变得有点复杂:
To match the range you specified the expression will become a bit complicated:
/([\x4E-\x9E][\x00-\xFF])|(\x9F[\x00-\xA5])/ # (untested)
As noted in the comments, the unicode characters \u4E00-\u9FA5 only map to the expression above in the UTF16-BE encoding. The UTF8 encoding is likely different. So you need to analyze the mapping carefully and see if you can come up with a byte-matching expression for Ruby 1.8.
这篇关于如何在ruby中为utf8使用正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!