如何在ruby中为utf8使用正则表达式 [英] How to use regex for utf8 in ruby

查看：77 发布时间：2020/7/13 3:10:11 ruby-on-rails ruby regex utf-8

本文介绍了如何在ruby中为utf8使用正则表达式的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在RoR中，如何使用utf8代码验证发布表格的中文或日语单词.

In RoR,how to validate a Chinese or a Japanese word for a posting form with utf8 code.

在GBK代码中，它使用[\ u4e00- \ u9fa5] +来验证中文单词. 在Php中，它对utf-8页面使用/^ [\ x {4e00}-\ x {9fa5}] + $/u.

In GBK code, it uses [\u4e00-\u9fa5]+ to validate Chinese words. In Php, it uses /^[\x{4e00}-\x{9fa5}]+$/u for utf-8 pages.

推荐答案

Ruby 1.8对UTF-8字符串的支持不佳.您需要在正则表达式中单独写入字节，而不是完整的代码:

Ruby 1.8 has poor support for UTF-8 strings. You need to write the bytes individually in the regular expression, rather then the full code:

>> "acentuação".scan(/\xC3\xA7/)
=> ["ç"]

要匹配您指定的范围，表达式将变得有点复杂:

To match the range you specified the expression will become a bit complicated:

/([\x4E-\x9E][\x00-\xFF])|(\x9F[\x00-\xA5])/  # (untested)

如注释中所述，unicode字符\ u4E00- \ u9FA5仅以UTF16-BE编码映射到上述表达式. UTF8编码可能有所不同.因此，您需要仔细分析映射，看看是否可以针对Ruby 1.8提出一个字节匹配的表达式.

As noted in the comments, the unicode characters \u4E00-\u9FA5 only map to the expression above in the UTF16-BE encoding. The UTF8 encoding is likely different. So you need to analyze the mapping carefully and see if you can come up with a byte-matching expression for Ruby 1.8.

这篇关于如何在ruby中为utf8使用正则表达式的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在ruby中为utf8使用正则表达式 [英] How to use regex for utf8 in ruby

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在ruby中为utf8使用正则表达式 [英] How to use regex for utf8 in ruby

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭