使用java中的regex在两个特定单词之间提取子字符串 [英] Extract sub-string between two certain words using regex in java

查看:288
本文介绍了使用java中的regex在两个特定单词之间提取子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用java在两个单词之间提取子字符串。

I would like to extract sub-string between certain two words using java.

例如:

This is an important example about regex for my work.

我想在和和 for

到目前为止我做的是:

String sentence = "This is an important example about regex for my work and for me";
Pattern pattern = Pattern.compile("(?<=an).*.(?=for)");
Matcher matcher = pattern.matcher(sentence);

boolean found = false;
while (matcher.find()) {
    System.out.println("I found the text: " + matcher.group().toString());
    found = true;
}
if (!found) {
    System.out.println("I didn't found the text");
}

效果很好。

但是我想再做两件事


  1. 如果句子是:这是一个关于我的工作和我的正则表达式的重要例子。
    我想提取到第一个 for 关于正则表达式的重要示例

有时我想将模式之间的单词数量限制为3个单词,即重要的例子

Some times I want to limit the number of words between the pattern to 3 words i.e. important example about

有什么想法吗?

推荐答案

对于你的第一个问题,让它变得懒惰。您可以在量词之后加上一个问号,然后量词将尽可能少地匹配。

For your first question, make it lazy. You can put a question mark after the quantifier and then the quantifier will match as less as possible.

(?<=an).*?(?=for)

我不知道附加的最后在。*。中有用。这是不必要的。

I have no idea what the additional . at the end is good for in .*. its unnecessary.

对于你的第二个问题,你必须定义一个单词是什么。我想在这里可能只是一个非空格序列,后跟一个空格。这样的东西

For your second question you have to define what a "word" is. I would say here probably just a sequence of non whitespace followed by a whitespace. Something like this

\S+\s

并像这样重复3次

(?<=an)\s(\S+\s){3}(?=for)

确保整个单词的模式使用单词边界

To ensure that the pattern mathces on whole words use word boundaries

(?<=\ban\b)\s(\S+\s){1,5}(?=\bfor\b)

在Regexr上在线查看

{3} 将与3完全匹配,最少为1,最多为3,为此 {1,3}

{3} will match exactly 3 for a minimum of 1 and a max of 3 do this {1,3}

替代方案:

由于dma_k在您的情况下正确说明,因此无需使用外观落后并展望未来。请参见此处有关群组的匹配器文档

As dma_k correctly stated in your case here its not necessary to use look behind and look ahead. See here the Matcher documentation about groups

您可以改为使用捕获组。只需将要提取的部分放在括号中,它就会被放入捕获组。

You can use capturing groups instead. Just put the part you want to extract in brackets and it will be put into a capturing group.

\ban\b(.*?)\bfor\b

看到它在线使用Regexr

您可以像这样访问此群组

You can than access this group like this

System.out.println("I found the text: " + matcher.group(1).toString());
                                                        ^

你只有一对括号,所以它很简单,只需要一个 1 进入 matcher.group(1)以访问第一个捕获组。

You have only one pair of brackets, so its simple, just put a 1 into matcher.group(1) to access the first capturing group.

这篇关于使用java中的regex在两个特定单词之间提取子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆