使用Pyspark时，您会从Kryo序列化程序中受益吗? [英] Do you benefit from the Kryo serializer when you use Pyspark?

查看：61 发布时间：2021/4/8 19:35:12 apache-spark pyspark kryo

本文介绍了使用Pyspark时，您会从Kryo序列化程序中受益吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我读到Kryo序列化程序在Apache Spark中使用时可以提供更快的序列化.但是，我正在通过Python使用Spark.

I read that the Kryo serializer can provide faster serialization when used in Apache Spark. However, I'm using Spark through Python.

切换到Kryo序列化器还能从中获得显着的好处吗?

Do I still get notable benefits from switching to the Kryo serializer?

推荐答案

Kryo 不会对 PySpark 产生重大影响，因为它只是将数据存储为byte [] 对象，即使使用Java也可以快速序列化.

Kryo won’t make a major impact on PySpark because it just stores data as byte[] objects, which are fast to serialize even with Java.

但是可能值得一试-您只需设置 spark.serializer 配置，并尝试不注册任何类.

But it may be worth a try — you would just set the spark.serializer configuration and trying not to register any classe.

可能会产生更大影响的是将您的数据存储为 MEMORY_ONLY_SER 并启用 spark.rdd.compress ，这会将它们压缩为数据.

What might make more impact is storing your data as MEMORY_ONLY_SER and enabling spark.rdd.compress, which will compress them your data.

在 Java 中，这可能会增加一些CPU开销，但是 Python 的运行速度要慢得多，因此可能无关紧要.它还可能通过减少GC或让您缓存更多数据来加快计算速度.

In Java this can add some CPU overhead, but Python runs quite a bit slower, so it might not matter. It might also speed up computation by reducing GC or letting you cache more data.

参考资料: Matei Zaharia的答案在邮件列表中.

Reference : Matei Zaharia's answer in the mailing list.

这篇关于使用Pyspark时，您会从Kryo序列化程序中受益吗?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Pyspark时，您会从Kryo序列化程序中受益吗? [英] Do you benefit from the Kryo serializer when you use Pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用Pyspark时，您会从Kryo序列化程序中受益吗? [英] Do you benefit from the Kryo serializer when you use Pyspark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭