如何将引号分隔的字符串与正则表达式匹配? [英] How can I match a quote-delimited string with a regex?

查看:26
本文介绍了如何将引号分隔的字符串与正则表达式匹配?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我尝试将引号分隔的字符串与正则表达式匹配,以下哪一项是更好"的(其中更好"意味着更有效并且不太可能做一些意外的事情):

If I'm trying to match a quote-delimited string with a regex, which of the following is "better" (where "better" means both more efficient and less likely to do something unexpected):

/"[^"]+"/ # match quote, then everything that's not a quote, then a quote

/".+?"/   # match quote, then *anything* (non-greedy), then a quote

假设对于这个问题,空字符串(即")不是问题.在我看来(不是正则表达式新手,但肯定不是专家)这些将是等效的.

Assume for this question that empty strings (i.e. "") are not an issue. It seems to me (no regex newbie, but certainly no expert) that these will be equivalent.

更新:经过反思,我认为将 + 字符更改为 * 无论如何都会正确处理空字符串.

Update: Upon reflection, I think changing the + characters to * will handle empty strings correctly anyway.

推荐答案

你应该使用第一,因为第二是不好的做法.考虑到您后面的开发人员想要匹配后跟感叹号的字符串.他应该使用:

You should use number one, because number two is bad practice. Consider that the developer who comes after you wants to match strings that are followed by an exclamation point. Should he use:

"[^"]*"!

或:

".*?"!

当你有主题时就会出现差异:

The difference appears when you have the subject:

"one" "two"!

第一个正则表达式匹配:

The first regex matches:

"two"!

当第二个正则表达式匹配时:

while the second regex matches:

"one" "two"!

总是尽可能具体.尽可能使用否定字符类.

Always be as specific as you can. Use the negated character class when you can.

另一个区别是 [^"]* 可以跨行,而 .* 则不能,除非您使用单行模式.[^" ]* 也不包括换行符.

Another difference is that [^"]* can span across lines, while .* doesn't unless you use single line mode. [^" ]* excludes the line breaks too.

至于回溯,第二个正则表达式对其匹配的每个字符串中的每个字符进行回溯.如果缺少结束引号,则两个正则表达式都将回溯整个文件.只有回溯的顺序不同.因此,理论上,第一个正则表达式更快.在实践中,您不会注意到差异.

As for backtracking, the second regex backtracks for each and every character in every string that it matches. If the closing quote is missing, both regexes will backtrack through the entire file. Only the order in which then backtrack is different. Thus, in theory, the first regex is faster. In practice, you won't notice the difference.

这篇关于如何将引号分隔的字符串与正则表达式匹配?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆