为Scala中的字符串生成频率映射 [英] Generating a frequency map for a string in Scala

查看：152 发布时间：2017/5/21 19:47:11 string scala map

本文介绍了为Scala中的字符串生成频率映射的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个字符串你好，我想生成一个字符频率图：

  Map [ Char，Int] = Map（h  - > 1，e  - > 1，o  - > 1，l  - > 2）

我可以迭代地执行：

  val str =hello
 varcount = new scala.collection.mutable.HashMap [Char，Int] 
 for（i<  -  str）{
 if（counts.contains（i））
 counts.put我，计数（i）+ 1）
 else 
 counts.put（i，1）
}

通过在REPL中混乱，我发现我可以做一些更简洁的事情，而不是使用可变的集合：

 > str.groupBy（_。toChar）.map {p => （p._1，p._2.length）} 
 scala.collection.immutable.Map [Char，Int] = Map（h  - > 1，e  - > 1，o  - > 1，l - > 2）

但是我不知道groupBy（）的性能特征和什么在块中传递给地图（就像什么，确切的是p）。

我如何用Scala中的功能范例成语？ >

对于背景，我刚刚从Ruby进入Scala。在Ruby中，我将使用 inject ，但我不知道在Scala中如何并行的方法是：

  counters = str.each_byte.inject（Hash.new（0））{| h，c | h [c] + = 1; h}

解决方案

1）什么是 `p` 表示？

groupBy 将一个元素映射到一个键的类型为 K 。在某些集合 Coll 上调用时，会返回一个 Map [K，Coll] ，其中包含来自密钥 K 到映射到同一个键的所有元素。

所以，在你的情况下， str .groupBy（_。toChar）产生一个从 k （它是一个字符）到具有所有元素（字符）的字符串的映射映射） c ，使 k == c.toChar 。
你得到这个：

 映射（e  - >e，h  - >h l  - >ll，o  - >o）

A Map 是一组键和值的迭代。在这种情况下，每对都是一个字符和一串元素。在映射上调用映射操作涉及映射这些对 - p 是一对 p._1 是一个字符， p._2 是相关联的字符串（on您可以像上面那样调用长度。

2）如何用惯用的方式

以上是如何用惯用的方法 - 使用 groupBy 和 map 。或者，您可以使用字符串长度上的不可变地图和递归来计算频率，或使用不可变地图和 foldLeft 。

3）性能特征

最佳基准查看差异。
这是一个高度重复的字符串（〜3GHz iMac，JDK7，Scala 2.10.0每晚）的几个微基准：

  object Imperative extends testing.Benchmark {
 val str =abc* 750000 
 
 def run（）{
 var counts = new scala.collection.mutable。 HashMap [Char，Int] 
 var i = 0 
 val until = str.length 
 while（i< until）{
 var c = str（i）
如果（counts.contains（c））
 counts.put（c，计数（c）+ 1）
 else 
 counts.put（c，1）
i + = 1 
} 
 
 // println（f）
} 
} 
 
 
对象组合器扩展testing.Benchmark {
 val str =abc* 750000 
 
 def run（）{
 val f = str.groupBy（_。toChar）.map（p =>（p._1 ，p._2.length））
} 
} 
 
 
对象折叠扩展testing.Benchmark {
 val str =abc* 750000 
 
 def run（）{
 val f = str.foldLeft（Map [Char，Int]（）withDefaultV alue 0）{（h，c）=> h.updated（c，h（c）+1）} 
} 
}

结果：

势在必行： $ 103 57 53 58 53 53 53 53 53 53

组合者： $ 72 51 63 56 53 52 52 54 53 53

折叠： $ 163 62 71 62 57 57 57 58 57 57

请注意，将命令式版本更改为使用 withDefaultValue ：

  var counts = new scala.collection.mutable.HashMap [Char，Int] .withDefaultValue（0）
 var i = 0 
 val until = str.length 
 while（i< until）{
 var c = str（i）
 counts.put（c，计数（c）+ 1）$ b $由于转发每个<$ c $，因此bi $ = 1 
}

显然非常慢c> put call：

withDefaultValue code> $ 133 87 109 106 101 100 101 100 101 101

结论：在这种情况下，人物的装箱和拆箱是足够高的，所以这些方法之间的性能差异很难观察。

编辑：

更新：您可能希望使用 ScalaMeter内联基准测试来代替基准 trait。

Let's say I have a string, "hello", and I want to generate a character frequency map:

Map[Char,Int] = Map(h -> 1, e -> 1, o -> 1, l -> 2)

I could do this iteratively:

val str = "hello"
var counts = new scala.collection.mutable.HashMap[Char,Int]
for (i <- str) {
    if (counts.contains(i))
        counts.put(i, counts(i) + 1)
    else
        counts.put(i, 1)
}

By messing around in the REPL, I've found I can do something a bit more concise and not using a mutable collection:

> str.groupBy(_.toChar).map{ p => (p._1, p._2.length)}
scala.collection.immutable.Map[Char,Int] = Map(h -> 1, e -> 1, o -> 1, l -> 2)

But I don't know about the performance characteristics of groupBy() nor what is going on in the block passed to map (like what, exactly, p is).

How do I do this idiomatically using the functional paradigms in Scala?

For background, I'm just coming to Scala for the first time from Ruby. In Ruby, I would use inject but I'm not sure what the parallel way to do it in Scala is:

counts = str.each_byte.inject(Hash.new(0)){ |h, c| h[c] += 1; h}

解决方案

1) What does `p` mean?

groupBy takes a function which maps an elements to a key of type K. When invoked on some collection Coll, it returns a Map[K, Coll] which contains mappings from keys K to all the elements which mapped to the same key.

So, in your case, str.groupBy(_.toChar) yields a map mapping from a key k (which is a character) to a string with all the elements (characters) c such that k == c.toChar. You get this:

Map(e -> "e", h -> "h", l -> "ll", o -> "o")

A Map is an iterable of pairs of keys and values. In this case, each pair is a character and a string of elements. Calling the map operation on a Map involves mapping on these pairs - p is a pair where p._1 is a character, and p._2 is the associated string (on which you can call length, as you did above).

2) How to do this idiomatically

The above is how to do it idiomatically - using groupBy and map. Alternatively, you can use an immutable map and recursion on the string length to compute the frequencies, or an immutable map and a foldLeft.

3) Performance characteristic

Best to benchmark to see the differences. Here are a couple of microbenchmark for a highly-repetitive string (~3GHz iMac, JDK7, Scala 2.10.0 nightly):

object Imperative extends testing.Benchmark {
  val str = "abc" * 750000

  def run() {
    var counts = new scala.collection.mutable.HashMap[Char,Int]
    var i = 0
    val until = str.length
    while (i < until) {
      var c = str(i)
      if (counts.contains(c))
        counts.put(c, counts(c) + 1)
      else
        counts.put(c, 1)
      i += 1
    }

    //println(f)
  }
}


object Combinators extends testing.Benchmark {
  val str = "abc" * 750000

  def run() {
    val f = str.groupBy(_.toChar).map(p => (p._1, p._2.length))
  }
}


object Fold extends testing.Benchmark {
  val str = "abc" * 750000

  def run() {
    val f = str.foldLeft(Map[Char, Int]() withDefaultValue 0){(h, c) => h.updated(c, h(c)+1)}
  }
}

Results:

Imperative: $ 103 57 53 58 53 53 53 53 53 53
Combinators: $ 72 51 63 56 53 52 52 54 53 53
Fold: $ 163 62 71 62 57 57 57 58 57 57

Note that changing the imperative version to use withDefaultValue:

var counts = new scala.collection.mutable.HashMap[Char,Int].withDefaultValue(0)
var i = 0
val until = str.length
while (i < until) {
  var c = str(i)
  counts.put(c, counts(c) + 1)
  i += 1
}

is apparently terribly slow due to forwarding each put call:

withDefaultValue: $ 133 87 109 106 101 100 101 100 101 101

Conclusion: the boxing and unboxing of characters in this case is high-enough so that the differences in performance between these approaches are hard to observe.

EDIT:

Update: You may want to use ScalaMeter inline benchmarking in place of the Benchmark trait.

这篇关于为Scala中的字符串生成频率映射的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为Scala中的字符串生成频率映射 [英] Generating a frequency map for a string in Scala

问题描述

1）什么是 `p` 表示？

2）如何用惯用的方式

3）性能特征

1) What does `p` mean?

2) How to do this idiomatically

3) Performance characteristic

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为Scala中的字符串生成频率映射 [英] Generating a frequency map for a string in Scala

问题描述

1）什么是 p 表示？

2）如何用惯用的方式

3）性能特征

1) What does p mean?

2) How to do this idiomatically

3) Performance characteristic

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

1）什么是 `p` 表示？

1) What does `p` mean?

登录关闭