用java 8计算字数 [英] Word count with java 8
问题描述
我正在尝试在java 8中实现字数统计程序,但我无法使其工作。该方法必须将字符串作为参数并返回 Map< String,Integer>
。
当我用旧java方式进行时,everthing工作正常。但是当我尝试在java 8中执行它时,它返回一个映射,其中键是空的,具有正确的出现次数。
这是我的java 8风格的代码:
public Map< String,Integer> countJava8(String input){
return Pattern.compile((\\\ +))。splitAsStream(input).collect(Collectors.groupingBy(e - > e.toLowerCase(),Collectors.reducing (0,e - > 1,Integer :: sum)));
}
以下是我在正常情况下使用的代码:
公共地图< String,Integer> count(字符串输入){
Map< String,Integer> wordcount = new HashMap<>();
模式编译= Pattern.compile(((\\\\ +));
Matcher matcher = compile.matcher(输入);
while(matcher.find()){
String word = matcher.group()。toLowerCase();
if(wordcount.containsKey(word)){
Integer count = wordcount.get(word);
wordcount.put(word,++ count);
} else {
wordcount.put(word.toLowerCase(),1);
}
}
返回wordcount;
}
主程序:
public static void main(String [] args){
WordCount wordCount = new WordCount();
Map< String,Integer> phrase = wordCount.countJava8(一条鱼两条鱼红鱼蓝鱼);
Map< String,Integer> count = wordCount.count(一条鱼两条鱼红鱼蓝鱼);
System.out.println(短语);
System.out.println();
System.out.println(count);
}
当我运行这个程序时,我拥有的输出:
{= 7,= 1}
{red = 1,blue = 1,one = 1,fish = 4,two = 1}
我认为方法 splitAsStream
会将正则表达式中的匹配元素流式传输为 Stream
。我该如何纠正?
问题似乎是你实际上用语言拆分,即你正在流式传输在不一个单词的所有内容中,或在单词之间。不幸的是,似乎没有相同的方法来流式传输实际的匹配结果(很难相信,但我没有找到;如果你知道的话可以随意发表评论)。
<相反,您可以使用
\W
而不是 \w
来拆分非单词。另外,如注释中所述,通过使用 String :: toLowerCase
而不是lambda和<$ c $,可以使有点更具可读性c> Collectors.summingInt 。 public static Map< String,Integer> countJava8(String input){
return Pattern.compile(\\W +)
.splitAsStream(input)
.collect(Collectors.groupingBy(String :: toLowerCase,$ b) $ b Collectors.summingInt(s - > 1)));
}
但恕我直言这仍然很难理解,不仅仅是因为逆查找,并且很难推广到其他更复杂的模式。就个人而言,我会选择旧学校解决方案,也许使用新的 getOrDefault
。
public static Map< String,Integer> countOldschool(字符串输入){
Map< String,Integer> wordcount = new HashMap<>();
Matcher matcher = Pattern.compile(\\\\ + +)。matcher(输入);
while(matcher.find()){
String word = matcher.group()。toLowerCase();
wordcount.put(word,wordcount.getOrDefault(word,0)+ 1);
}
返回wordcount;
}
两种情况下的结果似乎相同。
I am trying to implement a word count program in java 8 but I am unable to make it work. The method must take a string as parameter and returns a Map<String,Integer>
.
When I am doing it in old java way, everthing works fine. But when I am trying to do it in java 8, it returns a map where the keys are the empty with the correct occurrences.
Here is my code in a java 8 style :
public Map<String, Integer> countJava8(String input){
return Pattern.compile("(\\w+)").splitAsStream(input).collect(Collectors.groupingBy(e -> e.toLowerCase(), Collectors.reducing(0, e -> 1, Integer::sum)));
}
Here is the code I would use in a normal situation :
public Map<String, Integer> count(String input){
Map<String, Integer> wordcount = new HashMap<>();
Pattern compile = Pattern.compile("(\\w+)");
Matcher matcher = compile.matcher(input);
while(matcher.find()){
String word = matcher.group().toLowerCase();
if(wordcount.containsKey(word)){
Integer count = wordcount.get(word);
wordcount.put(word, ++count);
} else {
wordcount.put(word.toLowerCase(), 1);
}
}
return wordcount;
}
The main program :
public static void main(String[] args) {
WordCount wordCount = new WordCount();
Map<String, Integer> phrase = wordCount.countJava8("one fish two fish red fish blue fish");
Map<String, Integer> count = wordCount.count("one fish two fish red fish blue fish");
System.out.println(phrase);
System.out.println();
System.out.println(count);
}
When I run this program, the outputs that I have :
{ =7, =1}
{red=1, blue=1, one=1, fish=4, two=1}
I thought that the method splitAsStream
would stream the matching elements in the regex as Stream
. How can I correct that?
The problem seems to be that you are in fact splitting by words, i.e. you are streaming over everything that is not a word, or that is in between words. Unfortunately, there seems to be no equivalent method for streaming the actual match results (hard to believe, but I did not find any; feel free to comment if you know one).
Instead, you could just split by non-words, using \W
instead of \w
. Also, as noted in comments, you can make it a bit more readable by using String::toLowerCase
instead of a lambda and Collectors.summingInt
.
public static Map<String, Integer> countJava8(String input) {
return Pattern.compile("\\W+")
.splitAsStream(input)
.collect(Collectors.groupingBy(String::toLowerCase,
Collectors.summingInt(s -> 1)));
}
But IMHO this is still very hard to comprehend, not only because of the "inverse" lookup, and it's also difficult to generalize to other, more complex patterns. Personally, I would just go with the "old school" solution, maybe making it a bit more compact using the new getOrDefault
.
public static Map<String, Integer> countOldschool(String input) {
Map<String, Integer> wordcount = new HashMap<>();
Matcher matcher = Pattern.compile("\\w+").matcher(input);
while (matcher.find()) {
String word = matcher.group().toLowerCase();
wordcount.put(word, wordcount.getOrDefault(word, 0) + 1);
}
return wordcount;
}
The result seems to be the same in both cases.
这篇关于用java 8计算字数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!