使用Pyspark时,您会从Kryo序列化程序中受益吗? [英] Do you benefit from the Kryo serializer when you use Pyspark?

查看:61
本文介绍了使用Pyspark时,您会从Kryo序列化程序中受益吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读到Kryo序列化程序在Apache Spark中使用时可以提供更快的序列化.但是,我正在通过Python使用Spark.

I read that the Kryo serializer can provide faster serialization when used in Apache Spark. However, I'm using Spark through Python.

切换到Kryo序列化器还能从中获得显着的好处吗?

Do I still get notable benefits from switching to the Kryo serializer?

推荐答案

Kryo 不会对 PySpark 产生重大影响,因为它只是将数据存储为byte [] 对象,即使使用Java也可以快速序列化.

Kryo won’t make a major impact on PySpark because it just stores data as byte[] objects, which are fast to serialize even with Java.

但是可能值得一试-您只需设置 spark.serializer 配置,并尝试不注册任何类.

But it may be worth a try — you would just set the spark.serializer configuration and trying not to register any classe.

可能会产生更大影响的是将您的数据存储为 MEMORY_ONLY_SER 并启用 spark.rdd.compress ,这会将它们压缩为数据.

What might make more impact is storing your data as MEMORY_ONLY_SER and enabling spark.rdd.compress, which will compress them your data.

Java 中,这可能会增加一些CPU开销,但是 Python 的运行速度要慢得多,因此可能无关紧要.它还可能通过减少GC或让您缓存更多数据来加快计算速度.

In Java this can add some CPU overhead, but Python runs quite a bit slower, so it might not matter. It might also speed up computation by reducing GC or letting you cache more data.

参考资料: Matei Zaharia的答案在邮件列表中.

Reference : Matei Zaharia's answer in the mailing list.

这篇关于使用Pyspark时,您会从Kryo序列化程序中受益吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆