从长度超过 N 个字符的文本中提取单词 - RegExp/Java/Android [英] Extract words from text having length more than N characters - RegExp/Java/Android
问题描述
我的第一个要求是使用 Java 中的正则表达式从某些文本中提取所有单词.
My first requirement was to extract all words from some text using regular expression in Java.
以下代码对我来说非常完美
Following code is doing it perfectly for me
String[] words = text.split("[^\\w']+");
它还删除除撇号('
)以外的所有标点符号和特殊字符
It also removes all punctuations and special characters except apostrophe('
)
我的下一个任务是提取超过(比如说)3
个字符的单词,重要的是,我想在上面提到的正则表达式中做到这一点.
My next task is to extract words has more than (say) 3
characters, and importantly, I want to do this in above mentioned regular expression.
您可能会想出一些其他的正则表达式来完成这两项任务.
推荐答案
有趣的事实.word
是形成句子的语音或文字的单个不同元素,通常在两侧显示一个空格.\w
匹配(任何 letter
、number
或 underscore
)
Fun fact. A word
is a single distinct element of speech or writing to form a sentence and typically shown with a space on either side. \w
matches (any letter
, number
or underscore
)
如果没有更好地解释您要实现的目标,就不清楚您到底在问什么.
It is unclear to exactly what you are asking without a better explanation of what you are trying to accomplish.
如果您想匹配包含 letters
和撇号 '
且更多多于 3
个字符的单词..
If you want to match a word that contains letters
and apostrophe '
with more than 3
characters..
List<String> words = new ArrayList<String>();
String s = "I want to have alot of money's when I am older.";
Pattern p = Pattern.compile("[a-zA-Z']{4,}");
Matcher m = p.matcher(s);
while (m.find()) {
words.add(m.group());
}
System.out.println(words);
// [want, have, alot, money's, when, older]
注意:这匹配包含超过3
个字符的单词,如果您还想匹配包含3
个字符的单词(foo
) 或更多,您可以使用以下内容.
Note: This matches a word that contain's more than 3
characters, if you also want to match a word that contains 3
characters (foo
) or more, you can use the following.
Pattern p = Pattern.compile("[a-zA-Z']{3,}");
这篇关于从长度超过 N 个字符的文本中提取单词 - RegExp/Java/Android的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!