使用 Java Regex 在一个句子中查找多个匹配的单词 [英] Use Java Regex to find multiple matching words in a sentence

查看:44
本文介绍了使用 Java Regex 在一个句子中查找多个匹配的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个句子,一组词说;梅威瑟,不败……等等.我想:

I have a sentence, and a set of words say; Mayweather, undefeated … etc. I want to:

  1. 检查句子是否包含上述任何一个词……(我希望它只查找匹配的词,基本上忽略句号、逗号和换行符.)
  2. 如果是这样,我想在每个匹配的单词前后显示几个单词,也许可以使用 String.format()

这是我的代码,它似乎工作正常,但不是我想要的:

Here’s my code which seems to be working OK but not exactly how I want it:

String sentence = "Floyd Mayweather Jr is an American professional boxer " +
            "currently undefeated as a professional and is a five-division world champion, " +
            "having won ten world titles and the lineal championship in four different weight classes.";

    String newText = "";
    Pattern p = Pattern.compile("(Mayweather) .* (undefeated)");
    Matcher m = p.matcher(sentence);

    if (m.find()) {
        String group1 = m.group(1);
        String group2 = m.group(2);

        newText = String.format("%s ... %s" , group1, group2);
        System.out.println(newText);
    }

现在的输出是:

梅威瑟……不败

我想要的是这样的:

小弗洛伊德·梅威瑟是美国人……目前作为职业球员不败……

Floyd Mayweather Jr is an American ... currently undefeated as a professional ...

你能告诉我怎么做吗,或者指导我走向正确的方向,因为我被卡住了.

Can you please let me know how to do it, or guide me to the right direction cuz I’m stuck.

提前谢谢各位.

推荐答案

如果你真的想通过 RegEx 解决这个问题,你需要让你的捕获组匹配你想要输出的所有内容.目前,它们仅与您的搜索字词匹配:

If you really want to solve this via RegEx, you need to make your capturing groups match all that you want to output. Currently they match only your search terms:

(Mayweather) .* (undefeated)
// "Mayweather", "undefeated"

您可以尝试这样的操作(仅使用一组!),但这将与您的整个示例相匹配:

You could try something like this (using only one group!), but that would match your whole example:

(.*Mayweather.*undefeated.*)
// -whole text-

可以改成这样,再次匹配两个部分,前后最多12个字符(不要在中间的全部匹配"周围使用空格并使其不贪婪!):

Which could be changed to this, to match the two parts again and at most 12 characters before and after (do not use spaces around the "match all" in the middle and make it non-greedy!):

(.{0,12}Mayweather.{0,12}).*?(.{0,12}undefeated.{0,12})
// "Floyd Mayweather Jr is an Am", "r currently undefeated as a profes"

可以进一步细化以停在单词边界处(结果需要修剪):

Which could be further refined to stop at word boundaries (result will need to be trimmed):

(\b.{0,12}Mayweather.{0,12}\b).*?(\b.{0,12}undefeated.{0,12}\b)
// "Floyd Mayweather Jr is an ", " currently undefeated as a "

将其更改为输出固定数量的单词留给无聊的读者作为练习.

Changing this to output a fixed number of words is left as an exercise for the bored reader.

修复了前两个版本中.*"的贪婪(添加了?").

Fixed greediness of ".*" in last two versions (added "?").

这篇关于使用 Java Regex 在一个句子中查找多个匹配的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆