什么是星火序列化和Java序列之间的区别? [英] What is the difference between Spark Serialization and Java Serialization?

查看:142
本文介绍了什么是星火序列化和Java序列之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是星火+纱线和我有我想要的分布式节点上调用服务。

I'm using Spark + Yarn and I have a service that I want to call on distributed nodes.

当我使用Java序列化在JUnit测试序列化此服务对象手动,该服务内的所有藏品以及序列化和反序列化:

When I serialize this service object "by hand" in a Junit test using java serialization, all inner collections of the service are well serialized and deserialized :

  @Test
  public void testSerialization() {  

    try (
        ConfigurableApplicationContext contextBusiness = new ClassPathXmlApplicationContext("spring-context.xml");
        FileOutputStream fileOutputStream = new FileOutputStream("myService.ser");
        ObjectOutputStream objectOutputStream = new ObjectOutputStream(fileOutputStream);
        ) {

      final MyService service = (MyService) contextBusiness.getBean("myServiceImpl");

      objectOutputStream.writeObject(service);
      objectOutputStream.flush();

    } catch (final java.io.IOException e) {
      logger.error(e.getMessage(), e);
    }
  }

  @Test
  public void testDeSerialization() throws ClassNotFoundException {  

    try (
        FileInputStream fileInputStream = new FileInputStream("myService.ser");
        ObjectInputStream objectInputStream = new ObjectInputStream(fileInputStream);
        ) {

      final MyService myService = (MyService) objectInputStream.readObject();

      // HERE a functionnal test who proves the service has been fully serialized and deserialized      .

    } catch (final java.io.IOException e) {
      logger.error(e.getMessage(), e);
    }
  }  

但是,当我试着通过星火发射器调用这个服务,羯羊我播服务对象与否,一些内在的集合(一个HashMap)消失(不是序列化)一样,如果它被标记为短暂(但它的不是暂时没有静态):

But when I try to call this service via my Spark launcher, wether I broadcast the service object or not, some inner collection (a HashMap) disappears (is not serialized) like if it was tagged as "transient" (but it's not transient neither static) :

JavaRDD<InputOjbect> listeInputsRDD = sprkCtx.parallelize(listeInputs, 10);
JavaRDD<OutputObject> listeOutputsRDD = listeInputsRDD.map(new   Function<InputOjbect, OutputObject>() {
  private static final long serialVersionUID = 1L;

  public OutputObject call(InputOjbect input) throws TarificationXmlException { // Exception

    MyOutput output = service.evaluate(input);
    return (new OutputObject(output));
  }
});

同样的结果,如果我的广播服务:

same result if I broadcast the service :

final Broadcast<MyService> broadcastedService = sprkCtx.broadcast(service);      
JavaRDD<InputOjbect> listeInputsRDD = sprkCtx.parallelize(listeInputs, 10);
JavaRDD<OutputObject> listeOutputsRDD = listeInputsRDD.map(new   Function<InputOjbect, OutputObject>() {
  private static final long serialVersionUID = 1L;

  public OutputObject call(InputOjbect input) throws TarificationXmlException { // Exception

    MyOutput output = broadcastedService.getValue().evaluate(input);
    return (new OutputObject(output));
  }
});

如果我启动在本地模式下,而不是纱线群集模式同样星火code,它完美的作品。

If I launch this same Spark code in local mode instead of yarn cluster mode, it works perfectly.

所以我的问题是:星火序列化和Java序列之间的区别是什么? (我不使用KRYO或任何自定义序列化)。

So my question is : What is the difference between Spark Serialization and Java Serialization ? (I'm not using Kryo or any customized serialization).

编辑:当我尝试用KRYO串行器(没有明确注册任何类),我有同样的问题。

EDIT : when I try with Kryo serializer (without registering explicitly any class), I have the same problem.

推荐答案

好吧,我发现它感谢我们的实验数据分析中的一个。

Ok, I've found it out thanks to one of our experimented data analyst.

那么,什么是这个谜呢?

So, what was this mystery about ?


  • 这是不是序列化(Java或KRYO)

  • 这是不是有些pre-治疗或治疗后的Spark会做前/后系列化

  • 这是不是对HashMap的领域,这是完全可序列化的(这其中如果u读的第一个例子我给的是明显的,但并不适合所有人;)

所以...

整个问题是这一点:

如果我推出以本地模式,而不是纱线集群同样星火code
  模式,它完美的作品。

"if I launch this same Spark code in local mode instead of yarn cluster mode, it works perfectly."

在纱线集群模式集合无法初始化,原因是它的随机节点上启动,无法进入,初始参考DATAS在磁盘上。在本地模式下,有一个明显的例外,当把磁盘上没有找到,但在集群模式下,它完全沉默的,它看上去像这个问题最初DATAS约为系列化。

In "yarn cluster" mode the collection was unable to be initialized, cause it was launched on a random node and couldn't access to, the initial reference datas on disk. In local mode, there was a clear exception when the initial datas where not found on disk, but in cluster mode it was fully silent and it looked like the problem was about serialization.

使用纱客户端模式解决了这个对我们来说。

Using "yarn client" mode solved this for us.

这篇关于什么是星火序列化和Java序列之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆