Kryo vs编码器vs Spark中的Java序列化? [英] Kryo vs Encoder vs Java Serialization in Spark?

查看:92
本文介绍了Kryo vs编码器vs Spark中的Java序列化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这种情况下使用哪种序列化,
从spark文档中说:
它提供了两个序列化库:
1. Java(默认)和
2. Kryo
现在,编码器是从哪里来的,为什么在文档中没有给出.
从数据块中也可以看出,编码器对数据集的执行速度更快,关于RDD的性能如何,以及如何将所有这些映射在一起.在哪种情况下,我们应该使用哪个序列化程序?

Which serialization is used for which case,
From spark documentation it says :
It provides two serialization libraries:
1. Java(default) and
2. Kryo
Now where did Encoders come from and why is it not given in the doc.
And also from databricks it says Encoders performs faster for Datasets,what about RDD, and how do all these maps together. In which case which serializer should we use?

推荐答案

  • 编码器仅在 Dataset 中使用.
  • Kryo 在内部使用火花.
  • 您可以使用
  • Kryo Java 序列化为您的数据改组.
    • Encoders are used in Dataset only.
    • Kryo is used internally in spark.
    • Kryo and Java serialization is available for you to use for your data shuffling.
    • 关于应该使用哪个选项-如果您不使用 Dataset Kryo 是最佳选择.否则,实际上您没有任何选择.

      As to which should you use - Kryo is your best option if you don't use Dataset. Otherwise you don't have any options, actually.

      这篇关于Kryo vs编码器vs Spark中的Java序列化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆