任务在 Spark 中不可序列化 [英] Task not serializable at Spark

查看:85
本文介绍了任务在 Spark 中不可序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的转变:

JavaRDD>映射Rdd = myRDD.values().map(new Function>() {@覆盖public Tuple2call(Pageview pageview) 抛出异常 {String key = pageview.getUrl().toString();长值 = getDay(pageview.getTimestamp());返回新的 Tuple2(key, value);}});

页面浏览是一种类型:Pageview.java

然后我将那个类注册到 Spark 中:

Class[] c = new Class[1];c[0] = Pageview.class;sparkConf.registerKryoClasses(c);

<块引用>

线程main"org.apache.spark.SparkException 中的异常:任务不是可序列化于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)在org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)在 org.apache.spark.SparkContext.clean(SparkContext.scala:1623) 在org.apache.spark.rdd.RDD.map(RDD.scala:286) 在org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:89)在org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46)在org.apache.gora.tutorial.log.ExampleSpark.run(ExampleSpark.java:100)在org.apache.gora.tutorial.log.ExampleSpark.main(ExampleSpark.java:53)引起:java.io.NotSerializableException:org.apache.gora.tutorial.log.ExampleSpark 序列化堆栈:- 对象不可序列化(类:org.apache.gora.tutorial.log.ExampleSpark,值:org.apache.gora.tutorial.log.ExampleSpark@1a2b4497)- 字段(类:org.apache.gora.tutorial.log.ExampleSpark$1,名称:this$0,类型:类 org.apache.gora.tutorial.log.ExampleSpark)- 对象(类 org.apache.gora.tutorial.log.ExampleSpark$1,org.apache.gora.tutorial.log.ExampleSpark$1@4ab2775d)- 字段(类:org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1,名称:fun$1,类型:接口org.apache.spark.api.java.function.Function)- 对象(类 org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1,) 在org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38)在org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)在org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80)在org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)... 7个

当我调试代码时,我看到 JavaSerializer.scala 被调用,即使有一个名为 KryoSerializer 的类.

PS 1:我不想使用 Java Serializer,但在 Pageview 上实现 Serializer 并不能解决问题.

PS 2:这并没有解决问题:

<代码>...//String key = pageview.getUrl().toString();//长值 = getDay(pageview.getTimestamp());String key = "Dummy";长值 = 1L;返回新的 Tuple2(key, value);...

解决方案

我在使用 Java 代码时多次遇到此问题.虽然我使用的是 Java 序列化,但我会将包含该代码的类设为可序列化,或者如果您不想这样做,我会将 Function 设为类的静态成员.

这是一个解决方案的代码片段.

公共类测试{私有静态函数 s = new Function>() {@覆盖public Tuple2call(Pageview pageview) 抛出异常 {String key = pageview.getUrl().toString();长值 = getDay(pageview.getTimestamp());返回新的 Tuple2(key, value);}};}

I have a transformation as like that:

JavaRDD<Tuple2<String, Long>> mappedRdd = myRDD.values().map(
    new Function<Pageview, Tuple2<String, Long>>() {
      @Override
      public Tuple2<String, Long> call(Pageview pageview) throws Exception {
        String key = pageview.getUrl().toString();
        Long value = getDay(pageview.getTimestamp());
        return new Tuple2<>(key, value);
      }
    });

Pageview is a type of: Pageview.java

and I register that class into Spark as like that:

Class[] c = new Class[1];
c[0] = Pageview.class;
sparkConf.registerKryoClasses(c);

Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1623) at org.apache.spark.rdd.RDD.map(RDD.scala:286) at org.apache.spark.api.java.JavaRDDLike$class.map(JavaRDDLike.scala:89) at org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46) at org.apache.gora.tutorial.log.ExampleSpark.run(ExampleSpark.java:100) at org.apache.gora.tutorial.log.ExampleSpark.main(ExampleSpark.java:53) Caused by: java.io.NotSerializableException: org.apache.gora.tutorial.log.ExampleSpark Serialization stack: - object not serializable (class: org.apache.gora.tutorial.log.ExampleSpark, value: org.apache.gora.tutorial.log.ExampleSpark@1a2b4497) - field (class: org.apache.gora.tutorial.log.ExampleSpark$1, name: this$0, type: class org.apache.gora.tutorial.log.ExampleSpark) - object (class org.apache.gora.tutorial.log.ExampleSpark$1, org.apache.gora.tutorial.log.ExampleSpark$1@4ab2775d) - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function) - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, ) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:38) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) ... 7 more

When I debug the code I see that JavaSerializer.scala is called even there is a class named KryoSerializer.

PS 1: I don't wanna use Java Serializer but implementing Serializer at Pageview does not solve the problem.

PS 2: This does not throw away the problem:

...
//String key = pageview.getUrl().toString();
//Long value = getDay(pageview.getTimestamp());
String key = "Dummy";
Long value = 1L;
return new Tuple2<>(key, value);
...

解决方案

I've run into this issue multiple times with Java code. Although I was using Java serialization, I would make the class that contains that code Serializable or if you don't want to do that I would make the Function a static member of the class.

Here is a code snippet of a solution.

public class Test {
   private static Function s = new Function<Pageview, Tuple2<String, Long>>() {

     @Override
     public Tuple2<String, Long> call(Pageview pageview) throws Exception {
       String key = pageview.getUrl().toString();
       Long value = getDay(pageview.getTimestamp());
       return new Tuple2<>(key, value);
      }
  };
}

这篇关于任务在 Spark 中不可序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆