多字节字符是否干扰正则表达式中的终端字符? [英] Does multibyte character interfere with end-line character within a regex?

查看:138
本文介绍了多字节字符是否干扰正则表达式中的终端字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用这个正则表达式:

  regex1 = / \z / 
/ pre>

以下字符串匹配:

 hello= 〜regex1#=> 5 
こんにちは=〜regex1#=> 5

但使用这些正则表达式:

  regex2 = /#$ /?\z / 
regex3 = / \\\
?\z /

他们显示差异:

 hello=〜regex2#= > 5 
hello=〜regex3#=> 5
こんにちは=〜regex2#=> nil
こんにちは=〜regex3#=> nil

什么是干扰?字符串编码是UTF-8,操作系统是Linux(即 $ / \\\
)。多字节字符是否干扰 $ / ?如何?

解决方案

Ruby trunk ,这个问题现在被接受为一个bug。希望它会被修复。



更新:Ruby trunk中已经发布了两个补丁。


With this regex:

regex1 = /\z/

the following strings match:

"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5

but with these regexes:

regex2 = /#$/?\z/
regex3 = /\n?\z/

they show difference:

"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil

What is interfering? The string encoding is UTF-8, and the OS is Linux (i.e., $/ is "\n"). Are the multibyte characters interfering with $/? How?

解决方案

In Ruby trunk, the issue has now been accepted as a bug. Hopefully, it will be fixed.

Update: Two patches have been posted in Ruby trunk.

这篇关于多字节字符是否干扰正则表达式中的终端字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆