任务不序列化异常 [英] Task not serializable exception

查看:250
本文介绍了任务不序列化异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于某种原因,我得到一个任务具有以下code不能序列例外。我用本地测试SBT火花上运行此。

For some reason I am getting a Task not serializable exception with the following code. I am running this on spark local using sbt test.

@RunWith(classOf[JUnitRunner])
class NQTest extends FeatureSpec with Matchers with Serializable {
  val conf = new SparkConf().setAppName("NQ Market Makers Test").setMaster("local")
  val sc = new SparkContext(conf)
  ...

  val testData : RDD[(String, String)] = sc.textFile("testcases/NQIntervalsTestData").map { line => (line.split(":", 2)(0), line.split(":", 2)(1)) }
  testData.persist();
  def testDatasets(input : Int) = {
    testData.filter(_ match {
      case (s, _) => (s == "Test Case " + input)
      case _      => false
    }).map(x => x match {
      case (_, line) => line
    })
  }

  ...

  feature("NQIntervals") {
    scenario("Test data sanity check") {
      (testDatasets(1).collect()) should not be null
    }
  }
}

和异常:

org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
        at org.apache.spark.rdd.RDD.filter(RDD.scala:303)
        at test.scala.org.<redacted>.NQTest$.testDatasets(NQTest.scala:31)

不像我在这里看到关于该异常的其他堆栈溢出的问题,这似乎是关于RDD本身,而不是我传递给过滤功能。

Unlike the other stack overflow questions that I've seen here regarding this exception, this seems to be concerning the RDD itself rather than the function I've passed to filter.

例如,我们可以删除过滤器并完全映射,我们仍可以在收集异常。从我的谷歌搜索我只能够找到答案涉及一个过滤器或地图,与RDD本身没有问题,内部非序列化对象的问题。

For example, we can remove the filter and map entirely and we still end up an exception during the collect. From my googling I've only been able to find answers to problems involving non serializable objects inside a filter or a map, not problems with the RDD itself.

事情到目前为止,我已经试过:


  • 删除了过滤器和映射testDatasets方法中,只是返回的TESTDATA集。这导致收集时被称为例外的情况发生。

  • 删除了单元测试框架完全,做NQTest直接延伸Serializable接口,并写了包括 testDatasets(1).collect()的一行主要方法:还是一样的异常

  • 删除 testData.persist():还是一样的异常

  • Removed the filter and map inside the testDatasets method and just returned the testData set. This caused the exception to happen when collect was called.
  • Removed the unit testing framework entirely, made NQTest extend Serializable directly and wrote a one line main method consisting of testDatasets(1).collect(): still the same exception
  • Removed testData.persist(): still the same exception

任何有识之士将受到欢迎!

Any insight would be welcome!

推荐答案

原来我是一个巨大的白痴,并停止火花背景下的实际测试在运行前。
无视

Turns out I was a huge idiot and was stopping the spark context before the actual tests were being run. Disregard

这篇关于任务不序列化异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆