使用也处理撇号的正则表达式匹配单词 [英] Match a word using regex that also handles apostrophes

查看:77
本文介绍了使用也处理撇号的正则表达式匹配单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须将一行文本分成单词,并且对使用什么正则表达式感到困惑.我到处寻找与单词匹配的正则表达式,并找到了与这篇文章类似的正则表达式,但希望在 Java 中使用它(Java 不处理常规字符串中的 \).

I have to separate a line of text into words, and am confused on what regex to use. I have looked everywhere for a regex that matches a word and found ones similar to this post but want it in java (java doesn't handle \ in regular strings).

正则表达式匹配单词和带撇号的单词

我已经为每个答案尝试了正则表达式,但不确定如何为此构造 java 正则表达式(我假设所有正则表达式都相同).如果在我看到的正则表达式中用 \ 替换 \,则正则表达式不起作用.

I have tried the regex for each answer and am unsure of how to structure a regex for java for this (i assumed all regex were the same). If replace \ by \ in the regex i see, the regex doesn't work.

我也尝试过自己查找并来到此页面:http://www.regular-expressions.info/reference.html

I have also tried looking it up myself and have come to this page: http://www.regular-expressions.info/reference.html

但我无法理解正则表达式高级技术.

But I cannot wrap my head around regex advanced techniques.

我使用 String.split(regex string here) 来分隔我的字符串.一个例子是,如果我得到以下内容:我喜欢吃,但我不喜欢吃每个人的食物,否则他们会饿死."我要匹配:

I am using String.split(regex string here) to separate my string. an example is if I'm given the following: "I like to eat but I don't like to eat everyone's food, or they'll starve." I want to match:

I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve

我也不想匹配 '' 或 '''' 或 ' ' 或 '.'' 或其他排列.我的分隔符条件应该类似于:[匹配任意单词字符][如果前面有单词字符也匹配撇号,如果有的话匹配后面的单词字符]

I also don't want to match '' or '''' or ' ' or '.'' or other permutations. My delimiter conditions should be similar to: [match any word character][also match an apostrophe if it is preceded by a word character and then match word characters after it if there are any]

我得到的只是一个匹配单词 [\w] 的简单正则表达式,但我不确定如何使用向前看或向后看来匹配撇号和剩余的单词.

What i got is just a simple regex that matches words [\w], but i am unsure of how to use lookahead or look behind to match the apostrophe and then the remaining words.

推荐答案

在我的评论中所述的页面上使用 WhirlWind 的答案,您可以执行以下操作:

Using answer from WhirlWind on the page stated in my comment you can do the following:

String candidate = "I \n"+
    "like \n"+
    "to "+
    "eat "+
    "but "+
    "I "+
    "don't "+
    "like "+
    "to "+
    "eat "+
    "everyone's "+
    "food "+
    "''  ''''  '.' ' "+
    "or "+
    "they'll "+
    "starv'e'";

String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
  System.out.println("> matched: `" + matcher.group() + "`");
}

它会打印:

> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`

您可以在此处找到运行示例:http://ideone.com/pVOmSK

You can find a running example here: http://ideone.com/pVOmSK

这篇关于使用也处理撇号的正则表达式匹配单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆