如何在 Spark 中使用 Kryo Serializer 缓存 DataFrame? [英] How can I cache DataFrame with Kryo Serializer in Spark?

查看：54 发布时间：2021/11/14 22:53:53 apache-spark dataframe apache-spark-sql kryo

本文介绍了如何在 Spark 中使用 Kryo Serializer 缓存 DataFrame?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将 Spark 与 Kryo Serializer 结合使用，以降低内存成本来存储一些数据.现在我遇到了一个问题，我无法使用 Kryo 序列化程序在内存中保存 DataFram e(其类型为 Dataset[Row]).我以为我需要做的就是将 org.apache.spark.sql.Row 添加到 classesToRegister，但仍然出现错误:

I am trying to use Spark with Kryo Serializer to store some data with less memory cost. And now I come across a trouble, I cannot save a DataFram e(whose type is Dataset[Row]) in memory with Kryo serializer. I thought all I need to do is to add org.apache.spark.sql.Row to classesToRegister, but error still occurs:

spark-shell --conf spark.kryo.classesToRegister=org.apache.spark.sql.Row --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.kryo.registrationRequired=true

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
import org.apache.spark.storage.StorageLevel

val schema = StructType(StructField("name", StringType, true) :: StructField("id", IntegerType, false) :: Nil)
val seq = Seq(("hello", 1), ("world", 2))
val df = spark.createDataFrame(sc.emptyRDD[Row], schema).persist(StorageLevel.MEMORY_ONLY_SER)
df.count()

错误如下:

我不认为将 byte[][] 添加到 classesToRegister 是一个好主意.那么我应该怎么做才能使用 Kryo 在内存中存储数据帧?

I don't think adding byte[][] to classesToRegister is a good idea. So what should I do to store a dataframe in memory with Kryo?

如何在 Spark 中使用 Kryo Serializer 缓存 DataFrame? [英] How can I cache DataFrame with Kryo Serializer in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在 Spark 中使用 Kryo Serializer 缓存 DataFrame? [英] How can I cache DataFrame with Kryo Serializer in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭