正则表达式“char类中的空范围错误" [英] Regular expression "empty range in char class error"

查看:86
本文介绍了正则表达式“char类中的空范围错误"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的代码中有一个正则表达式,用于匹配 url 模式并抛出错误:

I got a regex in my code, which is to match pattern of url and threw error:

/^(http|https):\/\/([\w-]+\.)+[\w-]+([\w- .\/?%&=]*)?$/

错误是char 类错误中的空范围".我发现原因在 ([\w- .\/?%&=]*)? 部分.Ruby 似乎将 \w- . 中的 - 识别为范围运算符,而不是文字 -.在dash中加入escape后,问题就解决了.

The error was "empty range in char class error". I found the cause of that is in ([\w- .\/?%&=]*)? part. Ruby seems to recognize - in \w- . as an operator for range instead of a literal -. After adding escape to the dash, the problem was solved.

但是原始的正则表达式在我同事的机器上运行良好.我们使用相同版本的 osx、rails 和 ruby​​:Ruby 版本是 ruby​​ 1.9.3p194,rails 是 3.1.6,osx 是 10.7.5.在我们将代码部署到我们的 Heroku 服务器之后,一切也都运行良好.为什么只有我的环境有关于这个正则表达式的错误?Ruby regex 解释的机制是什么?

But the original regular expression ran well on my co-workers' machines. We use the same version of osx, rails and ruby: Ruby version is ruby 1.9.3p194, rails is 3.1.6 and osx is 10.7.5. And after we deployed code to our Heroku server, everything worked fine too. Why did only my environment have error regarding this regex? What is the mechanism of Ruby regex interpreting?

推荐答案

我可以在 Ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux] 上复制这个错误,安装在 Ubuntu 12.04.1 LTS 上使用房车 1.13.4.但是,这不应该是特定于版本的错误.事实上,我很惊讶它能在其他机器上运行.

I can replicate this error on Ruby 1.9.3p194 (2012-04-20 revision 35410) [i686-linux], installed on Ubuntu 12.04.1 LTS using rvm 1.13.4. However, this should not be a version-specific error. In fact, I'm surprised it worked on the other machines at all.

一个同样失败的简单演示:

A a simpler demonstration that fails just as well:

"abcd" =~ /[\w- ]/

这是因为 [\w- ] 被解释为从任何单词字符开始到空格(或空白)的范围",而不是包含单词、连字符、或一个空间,这正是你想要的.

This is because [\w- ] is interpreted as "a range beginning with any word character up to space (or blank)", rather than a character class containing a word, a hyphen, or a space, which is what you had intended.

根据 Ruby 的正则表达式文档:

在字符类中,连字符 (-) 是一个元字符,表示包含的字符范围.[abcd] 等价于 [a-d].一个范围可以跟另一个范围,所以 [abcdwxyz] 等价于 [a-dw-z].范围或单个字符在字符类中出现的顺序无关紧要.

Within a character class the hyphen (-) is a metacharacter denoting an inclusive range of characters. [abcd] is equivalent to [a-d]. A range can be followed by another range, so [abcdwxyz] is equivalent to [a-dw-z]. The order in which ranges or individual characters appear inside a character class is irrelevant.

如您所见,在前面加上反斜杠会转义连字符,从而将正则表达式的性质从范围更改为字符类,从而消除了错误.但是,不建议在字符类中间转义连字符,因为在这种情况下很容易混淆连字符的预期含义.正如 m.buettner 指出的那样,始终在字符类的开头或结尾放置连字符:

As you saw, prepending a backslash escaped the hyphen, thus changing the nature of the regexp from a range to a character class, removing the error. However, escaping the hyphen in the middle of character class is not recommended, since it's easy to confuse the intended meaning of the hyphen in such cases. As m.buettner pointed out, always place hyphens either at the beginning or the end of a character class:

"abcd" =~ /[-\w ]/

这篇关于正则表达式“char类中的空范围错误"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆