Get n使用Java中的regex的字数 [英] Get n Number of words using regex in Java

查看:167
本文介绍了Get n使用Java中的regex的字数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个书的一部分,包括标点符号,换行符等,我想从文本中提取前n个词,并将其分为5个部分。正则表达式神秘化我。这是我想要的。我创建一个索引大小为0的数组,所有输入文本:

I have a section of a book, complete with punctuation, line breaks etc. and I want to be able to extract the first n words from the text, and divide that into 5 parts. Regex mystifies me. This is what I am trying. I creates an array of index size 0, with all the input text:

public static String getNumberWords2(String s, int nWords){
    String[] m = s.split("([a-zA-Z_0-9]+\b.*?)", (nWords / 5));
    return "Part One: \n" + m[1] + "\n\n" + 
           "Part Two: \n" + m[2] + "\n\n" + 
           "Part Three: \n" + m[3] + "\n\n" +
           "Part Four: \n" + m[4] + "\n\n" + 
           "Part Five: \n" + m[5];
}

谢谢!

推荐答案

我认为最简单,最有效的方法是简单地重复找一个字:

I think the simplest, and most efficient way, is to simply repeatedly find a "word":

Pattern p = Pattern.compile("(\\w+)");
Matcher m = p.matcher(chapter);
while (m.find()) {
  String word = m.group();
  ...
}



您可以改变word的定义,通过修改正则表达式。我写的只是使用正则表达式的字字符的概念,我想知道它可能比你想要做的更合适。但它不会包括引号字符,你可能需要在一个单词内允许。

You can vary the definition of "word" by modifying the regex. What I wrote just uses regex's notion of word characters, and I wonder if it might be more appropriate than what you're trying to do. But it won't for instance include quote characters, which you may need to allow within a word.

这篇关于Get n使用Java中的regex的字数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆