正则表达式没有提取确切的模式 [英] Regular expression not extracting the exact pattern

查看:43
本文介绍了正则表达式没有提取确切的模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Java 读取超过 100000 个字符的字符串.我有一个关键字列表,我搜索字符串,如果字符串存在,我会调用一个执行一些内部处理的函数.

I am working in Java to read a string of over 100000 characters. I have a list of keywords, that I search the string for, and if the string is present I call a function which does some internal processing.

例如,我拥有的关键字类型是face" - 我希望获得所有匹配faces"而不是facebook"的模式.我可以接受字符串中人脸后面的空格字符,因此如果在字符串中我有像face"或faces"或face"或faces"这样的匹配项,我也可以接受.但是我不能接受duckface"或duckface"等.

The kind of keyword I have is "face", for example - I wish to get all the patterns where I have matches for "faces" not "facebook". I can accept a space character behind the face in the string so if in a string I have a match like " face" or " faces" or "face " or " faces" i can accept that too. However I can not accept "duckface" or "duckface " etc.

我写了正则表达式

Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");

其中关键字是我的关键字列表,但我没有得到想要的结果.您能否阅读我的描述并提出可能存在的问题以及我该如何解决?

where keyword is my list of keywords, but I am not getting the desired results. Can you read my description and please suggest what might be issue and how I can fix it?

此外,如果共享一个指向 Java 页面的非常好的正则表达式的指针,我也会很感激.

Also if a pointer to a really good regex for Java page is shared I would appreciate that as well.

感谢贡献者..

编辑

我知道它不起作用的原因是我使用了以下代码:

The reason I know it is not working is I have used the following code:

Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
            Matcher m = p.matcher(myInputDataSting);
            if(m.find())
            {
                System.out.println("Its a Match: "+m.group());
}

这将返回一个空字符串...

This returns a blank string...

推荐答案

如果 keyword"face",那么你当前的正则表达式是

If keyword is "face", then your current regex is

\s+faces\s+|\s+

匹配一个或多个空格字符,后跟faces,后跟一个或多个空格字符,一个或多个空格人物.(管道 | 的优先级非常低.)

which matches either one or more whitespace characters, followed by faces, followed by one or more whitespace characters, or one or more whitespace characters. (The pipe | has very low precedence.)

你真正想要的是

\bfaces?\b

匹配一个词边界,后跟face,可选后跟s,后跟一个词边界.

which matches a word boundary, followed by face, optionally followed by s, followed by a word boundary.

所以,你可以写:

Pattern p = Pattern.compile("\\b"+keyword+"s?\\b");

(虽然这显然只适用于像 face 这样的单词,只需添加 s 就可以形成复数).

(though obviously this will only work for words like face that form their plurals by simply adding s).

您可以在 http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html,但这不是一个教程.为此,我建议您只在 Google 上搜索正则表达式教程",然后找到适合您的教程.(它不一定是特定于 Java 的:您会找到的大多数教程都是针对与 Java 非常相似的正则表达式风格.)

You can find a comprehensive listing of Java's regular-expression support at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html, but it's not much of a tutorial. For that, I'd recommend just Googling "regular expression tutorial", and finding one that suits you. (It doesn't have to be Java-specific: most of the tutorials you'll find are for flavors of regular-expression that are very similar to Java's.)

这篇关于正则表达式没有提取确切的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆