Apache中的星火案例类的平等 [英] Case class equality in Apache Spark

查看:210
本文介绍了Apache中的星火案例类的平等的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么在星火模式匹配不起作用一样Scala呢?请参见下面...函数例如f()的尝试在类模式匹配,这在斯卡拉REPL工作,但未能在Spark和结果在所有???。 F2()是一种变通方法,使用.isInstanceOf()得到想要的结果在星火,但据我所知,是在斯卡拉不好的形式。

Why does pattern matching in Spark not work the same as in Scala? See example below... function f() tries to pattern match on class, which works in the Scala REPL but fails in Spark and results in all "???". f2() is a workaround that gets the desired result in Spark using .isInstanceOf(), but I understand that to be bad form in Scala.

模式上的任何帮助在这种情况下在星火正确的方法匹配将是极大的AP preciated。

Any help on pattern matching the correct way in this scenario in Spark would be greatly appreciated.

abstract class a extends Serializable {val a: Int}
case class b(a: Int) extends a 
case class bNull(a: Int=0) extends a 

val x: List[a] = List(b(0), b(1), bNull())
val xRdd = sc.parallelize(x)

在模式匹配尝试这在斯卡拉REPL工作,但在星火失败

attempt at pattern matching which works in Scala REPL but fails in Spark

def f(x: a) = x match {
    case b(n) => "b"
    case bNull(n) => "bnull"
    case _ => "???"
}

变通方法,功能星火,但糟糕的形式(我认为)

workaround that functions in Spark, but is bad form (I think)

def f2(x: a) = {
    if (x.isInstanceOf[b]) {
        "b"
    } else if (x.isInstanceOf[bNull]) {
        "bnull"
    } else {
        "???"
    }
}

查看结果

xRdd.map(f).collect                   //does not work in Spark
                                      // result: Array("???", "???", "???")
xRdd.map(f2).collect                  // works in Spark
                                      // resut: Array("b", "b", "bnull")
x.map(f(_))                           // works in Scala REPL    
                                      // result: List("b", "b", "bnull")

使用版本...
星火导致火花shell中运行(星火1.6 AWS EMR-4.3)
斯卡拉REPL在SBT 0.13.9(斯卡拉2.10.5)

Versions used... Spark results run in spark-shell (Spark 1.6 on AWS EMR-4.3) Scala REPL in SBT 0.13.9 (Scala 2.10.5)

推荐答案

这是星火REPL一个已知的问题。您可以在 SPARK-2620 的更多细节。它影响星火REPL多个操作,包括对 PairwiseRDDs 最转换。例如:

This is a known issue with Spark REPL. You can find more details in SPARK-2620. It affects multiple operations in Spark REPL including most of transformations on the PairwiseRDDs. For example:

case class Foo(x: Int)

val foos = Seq(Foo(1), Foo(1), Foo(2), Foo(2))
foos.distinct.size
// Int = 2

val foosRdd = sc.parallelize(foos)
foosRdd.distinct.count
// Long = 4  

foosRdd.map((_, 1)).reduceByKey(_ + _).collect
// Array[(Foo, Int)] = Array((Foo(1),1), (Foo(1),1), (Foo(2),1), (Foo(2),1))

foosRdd.first == foos.head
// Boolean = false

Foo.unapply(foosRdd.first) == Foo.unapply(foos.head)
// Boolean = true

你可以做最简单的事情就是REPL外界定义和包装所需的案例类。任何code直接提交使用火花提交应正常工作。

在斯卡拉2.11+您可以在REPL直接创建包粘贴-raw

In Scala 2.11+ you can create a package directly in the REPL with paste -raw.

scala> :paste -raw
// Entering paste mode (ctrl-D to finish)

package bar

case class Bar(x: Int)


// Exiting paste mode, now interpreting.

scala> import bar.Bar
import bar.Bar

scala> sc.parallelize(Seq(Bar(1), Bar(1), Bar(2), Bar(2))).distinct.collect
res1: Array[bar.Bar] = Array(Bar(1), Bar(2))

这篇关于Apache中的星火案例类的平等的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆