Get n使用Java中的regex的字数 [英] Get n Number of words using regex in Java
问题描述
我有一个书的一部分,包括标点符号,换行符等,我想从文本中提取前n个词,并将其分为5个部分。正则表达式神秘化我。这是我想要的。我创建一个索引大小为0的数组,所有输入文本:
I have a section of a book, complete with punctuation, line breaks etc. and I want to be able to extract the first n words from the text, and divide that into 5 parts. Regex mystifies me. This is what I am trying. I creates an array of index size 0, with all the input text:
public static String getNumberWords2(String s, int nWords){
String[] m = s.split("([a-zA-Z_0-9]+\b.*?)", (nWords / 5));
return "Part One: \n" + m[1] + "\n\n" +
"Part Two: \n" + m[2] + "\n\n" +
"Part Three: \n" + m[3] + "\n\n" +
"Part Four: \n" + m[4] + "\n\n" +
"Part Five: \n" + m[5];
}
谢谢!
推荐答案
我认为最简单,最有效的方法是简单地重复找一个字:
I think the simplest, and most efficient way, is to simply repeatedly find a "word":
Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(chapter);
while (m.find()) {
String word = m.group();
...
}
您可以改变word的定义,通过修改正则表达式。我写的只是使用正则表达式的字字符的概念,我想知道它可能比你想要做的更合适。但它不会包括引号字符,你可能需要在一个单词内允许。
You can vary the definition of "word" by modifying the regex. What I wrote just uses regex's notion of word characters, and I wonder if it might be more appropriate than what you're trying to do. But it won't for instance include quote characters, which you may need to allow within a word.
这篇关于Get n使用Java中的regex的字数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!