理解 Java 中的正则表达式:split(" ") vs split("\t") - 它们什么时候工作,什么时候应该使用 [英] Understanding regex in Java: split(" ") vs split("\t") - when do they both work, and when should they be used

查看:21
本文介绍了理解 Java 中的正则表达式:split(" ") vs split("\t") - 它们什么时候工作,什么时候应该使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近发现我没有在我的代码中正确使用正则表达式.以制表符分隔字符串 str 为例,我一直在使用 str.split(" ").现在我意识到这是错误的,为了正确匹配标签,我应该使用 str.split("\t").

I have recently figured out that I haven't been using regex properly in my code. Given the example of a tab delimited string str, I have been using str.split(" "). Now I realize that this is wrong and to match the tabs properly I should use str.split("\t").

然而,我碰巧偶然发现了这个事实,因为我正在寻找其他东西的正则表达式模式.你看,有问题的代码 split(" ") 在我的情况下工作得很好,现在我很困惑,如果它是声明正则表达式的错误方式,为什么它确实有效匹配制表符.因此,问题是为了真正了解正则表达式在 Java 中是如何处理的,而不是仅仅将代码复制到 Eclipse 中,而不是真正关心它为什么工作......

However I happen to stumble upon this fact by pure chance, as I was looking for regex patterns for something else. You see, the faulty code split(" ")has been working quite fine in my case, and now I am confused as to why it does work if it's the wrong way to declare a regex for matching the tab character. Hence the question, for the sake of actually understanding how regex is handled in Java, instead of just copying the code into Eclipse and not really caring why it works...

以类似的方式,我发现了一段文本,它不仅以制表符分隔,而且以逗号分隔.更清楚地说,我正在解析的制表符分隔列表有时包括复合"项目,它们看起来像:item1,item2,item3,为了简单起见,我想将它们解析为单独的元素.在那种情况下,适当的正则表达式应该是:line.split("[\t,]"),或者我在这里也弄错了??

In a similar fashion I have come upon a piece of text which is not only tab-delimited but also comma delimited. More clearly put, the tab-delimited lists I am parsing sometimes include "compound" items which look like: item1,item2,item3 and I would like to parse them as separate elements, for the sake of simplicity. In that case the appropriate regex expression should be: line.split("[\t,]"), or am I mistaken here as well??

提前致谢,

推荐答案

当使用 " " 时,转义序列 被 Java 替换为字符 U+0009.当使用"\t"时,\t中的转义序列\被Java替换为,导致 然后由 正则表达式 解析器为字符 U+0009.

When using " ", the escape sequence is replaced by Java with the character U+0009. When using "\t", the escape sequence \ in \t is replaced by Java with , resulting in that is then interpreted by the regular expression parser as the character U+0009.

所以这两种符号都会被正确解释.只是什么时候换成对应的字符的问题.

So both notations will be interpreted correctly. It’s just the question when it is replaced with the corresponding character.

这篇关于理解 Java 中的正则表达式:split(" ") vs split("\t") - 它们什么时候工作,什么时候应该使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆