正则表达式匹配所有不以数字开头的单词 [英] Regex to match all words not starting with digit
问题描述
很抱歉提出这个笨拙的问题,但是我对正则表达式不是很热衷.我有几个这样的观点:
Sorry for the noobish questions, but i am not very keen with regex. I have several senteces like this:
text1.2text: text3,,text4 5. text6=== t7@ text8. T, 9-- T10
,我想使用scanner
返回:
a)个单字符, b)个包含字符和数字但以字符开头的单词.
其他任何东西都可以视为定界符.
a) single chars, b) words containing chars and digits but that do start with a char.
Anything else can be seen as a delimiter.
因此,在以上观点中,应将这些内容返回:
So in the above sentece these should be returned:
text1
text3
text4
text6
t7
text8
T
T10
我可以在扫描仪中使用多个定界符,例如"\\.|\\:|\\,|\\,,"
等,但是它可能是我要提取的单词之间的任何内容,但无论如何我都不是一个很好的方法.
I could use multiple delimiters in scanner like "\\.|\\:|\\,|\\,,"
etc but it could be anything in between the words i want to extract plus i do not this its a very good way to do it anyway.
我是否可以使用正则表达式作为定界符或在scanner.hasNext("regex")
中提取这些单词?
Is there a regex i can use as a delimiter or maybe in scanner.hasNext("regex")
to extract those words?
预先感谢
推荐答案
我不确定这是否是您的意思,但似乎您想将以下部分用作分隔符:
I am not sure if that is what you mean, but it seems that you want to use as delimiters these parts:
text1.2text: text3,,text4 5. text6=== t7@ text8. T, 9-- T10
^^^^^^^^^ ^^ ^^^^ ^^^^ ^^ ^^ ^^^^^^
这意味着您想在每个非字母数字字符(以及以其开头的数字开头的可选单词)上分割此字符串.如果是这种情况,您可以将扫描仪设置为使用定界符,例如
which means that you want to split this string on every non-alphanumeric-characters (and optional words starting with number after it). If that is the case you can set up your scanner to use delimiter like
"([^\\w]+(\\d\\w*)*)+"
-
[^\\w]+
一个或多个非字母数字字符 -
(\\d\\w*)*
,后跟零个或多个以数字开头的单词 -
([^\\w]+(\\d\\w*)*)+
分隔符可以重复多次(这样,我们将避免在分隔符之间返回空字符串) [^\\w]+
one or more non-alphanumeric-character(\\d\\w*)*
which can be followed by zero or more words which start with digit([^\\w]+(\\d\\w*)*)+
delimiter can repeat more than once (this way we will avoid returning empty strings between delimiters)
并简单地遍历next
元素.
演示:
String text = "text1.2text: text3,,text4 5. text6=== t7@ text8. T, 9-- T10";
Scanner sc = new Scanner(text);
sc.useDelimiter("([^\\w]+(\\d\\w*)*)+");
while(sc.hasNext())
System.out.println(sc.next());
输出:
text1
text3
text4
text6
t7
text8
T
T10
这篇关于正则表达式匹配所有不以数字开头的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!