按第一个字符对单词进行分组 [英] grouping words by first character

查看:112
本文介绍了按第一个字符对单词进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我所拥有的:一个文本文件,逐行读取.每个字符串包含一行.

What I have: A text-file which is read line by line. Each String contains a line.

我想要的内容:使用Java Streams将所有单词按第一个字符分组.

What I want: Group ALL words by first character using Java Streams.

我到目前为止所拥有的:

public static Map<Character, List<String>> groupByFirstChar(String fileName)
        throws IOException {

    return Files.lines(Paths.get(PATH)).
            flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
            map(s -> s.toLowerCase()).
            sorted((s1, s2) -> s1.compareTo(s2)).
            collect(Collectors.groupingBy(s -> s.charAt(0)));
}

问题:我遇到异常

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:646)
at textana.TextAnalysisFns.lambda$16(TextAnalysisFns.java:110)
at textana.TextAnalysisFns$$Lambda$36/159413332.apply(Unknown Source)
at java.util.stream.Collectors.lambda$groupingBy$196(Collectors.java:907)
at java.util.stream.Collectors$$Lambda$23/189568618.accept(Unknown Source)
at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
at java.util.stream.SortedOps$RefSortingSink$$Lambda$37/186370029.accept(Unknown Source)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at java.util.stream.SortedOps$RefSortingSink.end(SortedOps.java:390)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.Sink$ChainedReference.end(Sink.java:258)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:513)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:502)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at textana.TextAnalysisFns.groupByFirstChar(TextAnalysisFns.java:110)
at textana.SampleTextAnalysisApp.main(SampleTextAnalysisApp.java:95)

问题:为什么我会收到StringIndexOutOfBoundException?

Question: Why do I get a StringIndexOutOfBoundException ?

基于评论提示的解决方案:

public static Map<Character, List<String>> groupByFirstChar(String fileName)
        throws IOException {

    return Files.lines(Paths.get(PATH)).
            flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
            filter(s -> s.length() > 0).
            map(s -> s.toLowerCase()).
            collect(Collectors.groupingBy(s -> s.charAt(0)));
}

用户Eran的解决方案会在一开始就给我空字符串,这是我不想拥有的.

The Solution of User Eran would have given me the empty Strings in the beginning which I didn't want to have.

推荐答案

尝试过滤空字符串"",因为它们没有第一个导致charAt(0)引发此异常的字符.

Try filtering empty strings "" since they have no first character which is causing charAt(0) to throw this exception.

您可以使用

flatMap(s -> Stream.of(s.split("[^a-zA-Z]"))).
filter(s -> !s.trim().isEmpty()). //add this line

顺便说一句,您的方法可能应该使用其fileName参数.因此,也许可以考虑将Paths.get(PATH)更改为更类似的

BTW your method should probably use its fileName argument. So maybe consider changing Paths.get(PATH) into something more like

Paths.get(fileName).

Paths.get(PATH).resolve(fileName)

正如评论中已经提到的那样,由于您无需更改默认比较顺序,因此无需显式编写

Also as already mentioned by comment since you are not changing default comparison order you don't need to explicitly write

sorted((s1, s2) -> s1.compareTo(s2))

但简单

sorted()

也将正常工作,因为将在此处应用默认顺序.

will work as well since default order will be applied here.

@Alexis C所述. groupBy将返回HashMap,这意味着您的密钥将不会下令.如果您还想保留他们的订单,则可以将groupBy与LinkedHashMap like

As mentioned by @Alexis C. groupBy will return HashMap which means that your keys will not be ordered. If you would also like to preserve their order you can use groupBy with LinkedHashMap like

.collect(Collectors.groupingBy(s -> s.charAt(0), LinkedHashMap::new, Collectors.toList()));

这篇关于按第一个字符对单词进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆