按第一个字符对单词进行分组 [英] grouping words by first character
问题描述
我所拥有的:一个文本文件,逐行读取.每个字符串包含一行.
What I have: A text-file which is read line by line. Each String contains a line.
我想要的内容:使用Java Streams将所有单词按第一个字符分组.
What I want: Group ALL words by first character using Java Streams.
我到目前为止所拥有的:
public static Map<Character, List<String>> groupByFirstChar(String fileName)
throws IOException {
return Files.lines(Paths.get(PATH)).
flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
map(s -> s.toLowerCase()).
sorted((s1, s2) -> s1.compareTo(s2)).
collect(Collectors.groupingBy(s -> s.charAt(0)));
}
问题:我遇到异常
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:646)
at textana.TextAnalysisFns.lambda$16(TextAnalysisFns.java:110)
at textana.TextAnalysisFns$$Lambda$36/159413332.apply(Unknown Source)
at java.util.stream.Collectors.lambda$groupingBy$196(Collectors.java:907)
at java.util.stream.Collectors$$Lambda$23/189568618.accept(Unknown Source)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.stream.SortedOps$RefSortingSink$$Lambda$37/186370029.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:390)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at textana.TextAnalysisFns.groupByFirstChar(TextAnalysisFns.java:110)
at textana.SampleTextAnalysisApp.main(SampleTextAnalysisApp.java:95)
问题:为什么我会收到StringIndexOutOfBoundException?
Question: Why do I get a StringIndexOutOfBoundException ?
基于评论提示的解决方案:
public static Map<Character, List<String>> groupByFirstChar(String fileName)
throws IOException {
return Files.lines(Paths.get(PATH)).
flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
filter(s -> s.length() > 0).
map(s -> s.toLowerCase()).
collect(Collectors.groupingBy(s -> s.charAt(0)));
}
用户Eran的解决方案会在一开始就给我空字符串,这是我不想拥有的.
The Solution of User Eran would have given me the empty Strings in the beginning which I didn't want to have.
推荐答案
尝试过滤空字符串""
,因为它们没有第一个导致charAt(0)
引发此异常的字符.
Try filtering empty strings ""
since they have no first character which is causing charAt(0)
to throw this exception.
您可以使用
flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
filter(s -> !s.trim().isEmpty()). //add this line
顺便说一句,您的方法可能应该使用其fileName
参数.因此,也许可以考虑将Paths.get(PATH)
更改为更类似的
BTW your method should probably use its fileName
argument. So maybe consider changing Paths.get(PATH)
into something more like
Paths.get(fileName).
或
Paths.get(PATH).resolve(fileName)
正如评论中已经提到的那样,由于您无需更改默认比较顺序,因此无需显式编写
Also as already mentioned by comment since you are not changing default comparison order you don't need to explicitly write
sorted((s1, s2) -> s1.compareTo(s2))
但简单
sorted()
也将正常工作,因为将在此处应用默认顺序.
will work as well since default order will be applied here.
如 @Alexis C所述. groupBy将返回HashMap
,这意味着您的密钥将不会下令.如果您还想保留他们的订单,则可以将groupBy与LinkedHashMap
like
As mentioned by @Alexis C. groupBy will return HashMap
which means that your keys will not be ordered. If you would also like to preserve their order you can use groupBy with LinkedHashMap
like
.collect(Collectors.groupingBy(s -> s.charAt(0), LinkedHashMap::new, Collectors.toList()));
这篇关于按第一个字符对单词进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!