正则表达式没有提取确切的模式 [英] Regular expression not extracting the exact pattern
问题描述
我正在使用 Java 读取超过 100000 个字符的字符串.我有一个关键字列表,我搜索字符串,如果字符串存在,我会调用一个执行一些内部处理的函数.
I am working in Java to read a string of over 100000 characters. I have a list of keywords, that I search the string for, and if the string is present I call a function which does some internal processing.
例如,我拥有的关键字类型是face" - 我希望获得所有匹配faces"而不是facebook"的模式.我可以接受字符串中人脸后面的空格字符,因此如果在字符串中我有像face"或faces"或face"或faces"这样的匹配项,我也可以接受.但是我不能接受duckface"或duckface"等.
The kind of keyword I have is "face", for example - I wish to get all the patterns where I have matches for "faces" not "facebook". I can accept a space character behind the face in the string so if in a string I have a match like " face" or " faces" or "face " or " faces" i can accept that too. However I can not accept "duckface" or "duckface " etc.
我写了正则表达式
Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
其中关键字是我的关键字列表,但我没有得到想要的结果.您能否阅读我的描述并提出可能存在的问题以及我该如何解决?
where keyword is my list of keywords, but I am not getting the desired results. Can you read my description and please suggest what might be issue and how I can fix it?
此外,如果共享一个指向 Java 页面的非常好的正则表达式的指针,我也会很感激.
Also if a pointer to a really good regex for Java page is shared I would appreciate that as well.
感谢贡献者..
编辑
我知道它不起作用的原因是我使用了以下代码:
The reason I know it is not working is I have used the following code:
Pattern p = Pattern.compile("\\s+"+keyword+"s\\s+|\\s+");
Matcher m = p.matcher(myInputDataSting);
if(m.find())
{
System.out.println("Its a Match: "+m.group());
}
这将返回一个空字符串...
This returns a blank string...
推荐答案
如果 keyword
是 "face"
,那么你当前的正则表达式是
If keyword
is "face"
, then your current regex is
\s+faces\s+|\s+
匹配一个或多个空格字符,后跟faces
,后跟一个或多个空格字符,或一个或多个空格人物.(管道 |
的优先级非常低.)
which matches either one or more whitespace characters, followed by faces
, followed by one or more whitespace characters, or one or more whitespace characters. (The pipe |
has very low precedence.)
你真正想要的是
\bfaces?\b
匹配一个词边界,后跟face
,可选后跟s
,后跟一个词边界.
which matches a word boundary, followed by face
, optionally followed by s
, followed by a word boundary.
所以,你可以写:
Pattern p = Pattern.compile("\\b"+keyword+"s?\\b");
(虽然这显然只适用于像 face
这样的单词,只需添加 s
就可以形成复数).
(though obviously this will only work for words like face
that form their plurals by simply adding s
).
您可以在 http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html,但这不是一个教程.为此,我建议您只在 Google 上搜索正则表达式教程",然后找到适合您的教程.(它不一定是特定于 Java 的:您会找到的大多数教程都是针对与 Java 非常相似的正则表达式风格.)
You can find a comprehensive listing of Java's regular-expression support at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html, but it's not much of a tutorial. For that, I'd recommend just Googling "regular expression tutorial", and finding one that suits you. (It doesn't have to be Java-specific: most of the tutorials you'll find are for flavors of regular-expression that are very similar to Java's.)
这篇关于正则表达式没有提取确切的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!