为什么 Spark 不允许 map-side 与数组键组合? [英] Why Spark doesn't allow map-side combining with array keys?

查看:78
本文介绍了为什么 Spark 不允许 map-side 与数组键组合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是 Spark 1.3.1,我很好奇为什么 Spark 不允许在地图端组合中使用数组键.一段combineByKey函数:

I'm using Spark 1.3.1 and I'm curious why Spark doesn't allow using array keys on map-side combining. Piece of combineByKey function:

if (keyClass.isArray) {
  if (mapSideCombine) {
    throw new SparkException("Cannot use map-side combining with array keys.")
  }
}

推荐答案

基本上出于同样的原因 默认分区器不能分区数组键.

Basically for the same reason why default partitioner cannot partition array keys.

Scala Array 只是 Java 数组的包装器,它的 hashCode 不依赖于内容:

Scala Array is just a wrapper around Java array and its hashCode doesn't depend on a content:

scala> val x = Array(1, 2, 3)
x: Array[Int] = Array(1, 2, 3)

scala> val h = x.hashCode
h: Int = 630226932

scala> x(0) = -1

scala> x.hashCode() == h1
res3: Boolean = true

表示内容完全相同的两个数组不相等

It means that two arrays with exact the same content are not equal

scala> x
res4: Array[Int] = Array(-1, 2, 3)

scala> val y = Array(-1, 2, 3)
y: Array[Int] = Array(-1, 2, 3)

scala> y == x
res5: Boolean = false

结果 Arrays 不能用作有意义的键.如果您不相信,请检查当您使用 Array 作为 Scala Map 的键时会发生什么:

As result Arrays cannot be used as a meaningful keys. If you're not convinced just check what happens when you use Array as key for Scala Map:

scala> Map(Array(1) -> 1, Array(1) -> 2)
res7: scala.collection.immutable.Map[Array[Int],Int] = Map(Array(1) -> 1, Array(1) -> 2)

如果您想使用集合作为键,您应该使用不可变数据结构,例如 VectorList.

If you want to use a collection as key you should use an immutable data structure like a Vector or a List.

scala> Map(Array(1).toVector -> 1, Array(1).toVector -> 2)
res15: scala.collection.immutable.Map[Vector[Int],Int] = Map(Vector(1) -> 2)

另见:

这篇关于为什么 Spark 不允许 map-side 与数组键组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆