从长度超过 N 个字符的文本中提取单词 - RegExp/Java/Android [英] Extract words from text having length more than N characters - RegExp/Java/Android

查看:35
本文介绍了从长度超过 N 个字符的文本中提取单词 - RegExp/Java/Android的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一个要求是使用 Java 中的正则表达式从某些文本中提取所有单词.

My first requirement was to extract all words from some text using regular expression in Java.

以下代码对我来说非常完美

Following code is doing it perfectly for me

String[] words = text.split("[^\\w']+");

它还删除除撇号(')以外的所有标点符号和特殊字符

It also removes all punctuations and special characters except apostrophe(')

我的下一个任务是提取超过(比如说)3 个字符的单词,重要的是,我想在上面提到的正则表达式中做到这一点.

My next task is to extract words has more than (say) 3 characters, and importantly, I want to do this in above mentioned regular expression.

您可能会想出一些其他的正则表达式来完成这两项任务.

推荐答案

有趣的事实.word 是形成句子的语音或文字的单个不同元素,通常在两侧显示一个空格.\w 匹配(任何 letternumberunderscore)

Fun fact. A word is a single distinct element of speech or writing to form a sentence and typically shown with a space on either side. \w matches (any letter, number or underscore)

如果没有更好地解释您要实现的目标,就不清楚您到底在问什么.

It is unclear to exactly what you are asking without a better explanation of what you are trying to accomplish.

如果您想匹配包含 letters 和撇号 '更多多于 3 个字符的单词..

If you want to match a word that contains letters and apostrophe ' with more than 3 characters..

List<String> words = new ArrayList<String>();
String s  = "I want to have alot of money's when I am older.";
Pattern p = Pattern.compile("[a-zA-Z']{4,}");
Matcher m = p.matcher(s);
while (m.find()) {
  words.add(m.group());
}
System.out.println(words);

// [want, have, alot, money's, when, older]

注意:这匹配包含超过3个字符的单词,如果您还想匹配包含3个字符的单词(foo) 或更多,您可以使用以下内容.

Note: This matches a word that contain's more than 3 characters, if you also want to match a word that contains 3 characters (foo) or more, you can use the following.

Pattern p = Pattern.compile("[a-zA-Z']{3,}");

这篇关于从长度超过 N 个字符的文本中提取单词 - RegExp/Java/Android的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆