Java - 使用Regex提取字符串 [英] Java - Extract strings with Regex

查看:152
本文介绍了Java - 使用Regex提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个字符串

String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45";

我需要提取这3个子串

1234

06:30

07:45



如果我使用这个正则表达式\\\\ {2} \:\\\\ {2}我只能提取第一个小时06:30

and I need to extract these 3 substrings
1234
06:30
07:45

If I use this regex \\d{2}\:\\d{2} I'm only able to extract the first hour 06:30

Pattern depArrHours = Pattern.compile("\\d{2}\\:\\d{2}");
Matcher matcher = depArrHours.matcher(myString);
String firstHour = matcher.group(0);
String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1)

matcher.group(1)抛出异常。

另外我不知道如何提取1234.这个字符串可以改变,但它总是在'XX〜'之后?
你对如何将这些字符串与正则表达式匹配有任何想法吗?

matcher.group(1) throws an exception.
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ '
Do you have any idea on how to match these strings with regex expressions?

UPDATE

感谢Adam的建议我现在这个正则表达式匹配我的字符串

Thanks to Adam suggestion I've now this regex that match my string

Pattern p = Pattern.compile(".*XX~ (\\d{3,4}).*(\\d{1,2}:\\d{2}).*(\\d{1,2}:\\d{2})";

我匹配数字,2小时与matcher.group(1)匹配; matcher.group(2) ; matcher.group(3);

I match the number, and the 2 hours with matcher.group(1); matcher.group(2); matcher.group(3);

推荐答案

matcher.group() function期望采用单个整数参数:捕获组索引,从1开始。索引0是特殊的,表示整个匹配。使用一对括号(...)。括号内的任何内容都是捕获。组从左到右编号(再次从1开始),通过左括号(这意味着组可以重叠)。正则表达式中没有括号,可能没有组1。

The matcher.group() function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". A capturing group is created using a pair of parenthesis "(...)". Anything within the parenthesis is captures. Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). Since there are no parenthesis in your regular expression, there can be no group 1.

模式类涵盖正则表达式语法。

The javadoc on the Pattern class covers the regular expression syntax.

如果您正在寻找可能会重复多次的模式,您可以使用 Matcher。 反复查找(),直到它返回false。 Matcher.group(0)每次迭代一次将返回与该时间匹配的内容。

If you are looking for a pattern that might recur some number of times, you can use Matcher.find() repeatedly until it returns false. Matcher.group(0) once on each iteration will then return what matched that time.

如果你想构建一个大的正则表达式,一次性匹配所有东西(我相信你想要的东西)然后围绕你要捕获的三组东西中的每一组,放置一组捕获括号,使用 Matcher .match()然后 Matcher.group(n)其中n分别为1,2和3。当然 Matcher.match()也可能返回false,在这种情况下模式不匹配,并且您无法检索任何组。

If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match() and then Matcher.group(n) where n is 1, 2 and 3 respectively. Of course Matcher.match() might also return false, in which case the pattern did not match, and you can't retrieve any of the groups.

在你的例子中,你可能想要做的是让它匹配一些前面的文本,然后启动一个捕获组,匹配数字,结束捕获组等...我不知道我对你的确切输入格式有足够的了解,但这里有一个例子。

In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example.

让我们说我有这种形式的字符串:

Lets say I had strings of the form:

Eat 12 carrots at 12:30
Take 3 pills at 01:15

我想提取数量和时间。我的正则表达式如下所示:

And I wanted to extract the quantity and times. My regular expression would look something like:

"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})"

代码看起来像:

Pattern p = Pattern.compile("\\w+ (\\d+) [\\w ]+ (\\d{2}:\\d{2})");
Matcher m = p.matcher(oneline);
if(m.matches()) {
    System.out.println("The quantity is " + m.group(1));
    System.out.println("The time is " + m.group(2));
}

正则表达式表示包含单词,空格,一个或一个的字符串更多数字(在第1组中捕获),空格,一组单词和以空格结尾的空格,后跟一个时间(在第2组中捕获,时间假定小时始终为0 - 填充为2位数我会给你一个更接近你要找的东西的例子,但对可能的输入的描述有点模糊。

The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague.

这篇关于Java - 使用Regex提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆