为IntStream生成直方图Map会引发编译时错误 [英] Producing histogram Map for IntStream raises compile-time-error

查看:233
本文介绍了为IntStream生成直方图Map会引发编译时错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对构建霍夫曼编码原型感兴趣。为此,我想首先生成构成输入Java String 的字符的直方图。我在SO和其他地方看到过很多解决方案(例如:



不幸的是,我是Java 8 Stream 层次结构的新手,我不完全确定什么是最好的我的行动应该是。事实上,对于我正在尝试做的事情而言,采用 Map 的方式可能是太多了。请告知是否如此。

解决方案

问题是 s.chars()返回 IntStream - Stream 的特殊专长,它没有 collect 只需一个参数;它是 collect 需要3个参数。显然你可以使用盒装,这会将 IntStream 转换为 Stream< Integer>

  Map< Integer,Long> map = yourString.codePoints()
.boxed()
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.counting()));

但现在的问题是你已计算代码点而不是字符。如果您完全知道您的字符串是由 BMP 中的字符组成的,您可以安全地转换为 char ,如另一个答案中所示。如果你不是 - 事情就变得棘手了。



在这种情况下,你需要将单个unicode代码点作为一个字符 - 但它可能不适合 Java char - 有2个字节;一个unicode字符最多可以有4个字节。



在这种情况下,你的地图应该是 Map< String,Long> 而不是 Map< Character,Long>



在java-9中引入支持的 \ X (以及扫描仪#findAll )这很容易做到:

  String sample =A+\\\DD835 \ uDD0A+B+C; 
Map< String,Long> map = scan.findAll(\\X)
.map(MatchResult :: group)
.collect(Collectors.groupingBy(Function.identity(),Collectors.counting())) ;


System.out.println(map); // {A = 1,B = 1,C = 1,

I'm interested in building a Huffman Coding prototype. To that end, I want to begin by producing a histogram of the characters that make up an input Java String. I've seen many solutions on SO and elsewhere (e.g:here that depend on using the collect() methods for Streams as well as static imports of Function.identity() and Collectors.counting() in a very specific and intuitive way.

However, when using a piece of code eerily similar to the one I linked to above:

private List<HuffmanTrieNode> getCharsAndFreqs(String s){
        Map<Character, Long> freqs = s.chars().collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
        return null;
}

I receive a compile-time-error from Intellij which essentially tells me that there is no arguments to collect that conforms to a Supplier type, as required by its signature:

Unfortunately, I'm new to the Java 8 Stream hierarchy and I'm not entirely sure what the best course of action for me should be. In fact, going the Map way might be too much boilerplate for what I'm trying to do; please advise if so.

解决方案

The problem is that s.chars() returns an IntStream - a particular specialization of Stream and it does not have a collect that takes a single argument; it's collect takes 3 arguments. Obviously you can use boxed and that would transform that IntStream to Stream<Integer>.

Map<Integer, Long> map = yourString.codePoints()
          .boxed()
          .collect(Collectors.groupingBy(
                      Function.identity(), 
                      Collectors.counting()));

But now the problem is that you have counted code-points and not chars. If you absolutely know that your String is made from characters in the BMP, you can safely cast to char as shown in the other answer. If you are not - things get trickier.

In that case you need to get the single unicode code point as a character - but it might not fit into a Java char - that has 2 bytes; and a unicode character can be up to 4 bytes.

In that case your map should be Map<String, Long> and not Map<Character, Long>.

In java-9 with the introduction of supported \X (and Scanner#findAll) this is fairly easy to do:

 String sample = "A" + "\uD835\uDD0A" + "B" + "C";
         Map<String, Long> map = scan.findAll("\\X")
               .map(MatchResult::group)
               .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));


 System.out.println(map); // {A=1, B=1, C=1, 
                        

这篇关于为IntStream生成直方图Map会引发编译时错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆