从Spark写入时避免丢失分区数据的数据类型 [英] Avoid losing data type for the partitioned data when writing from Spark

查看：391 发布时间：2020/9/4 7:37:14 apache-spark spark-dataframe parquet

本文介绍了从Spark写入时避免丢失分区数据的数据类型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个如下数据框.

itemName, itemCategory
Name1, C0
Name2, C1
Name3, C0

我想将此数据帧另存为分区镶木文件:

I would like to save this dataframe as partitioned parquet file:

df.write.mode("overwrite").partitionBy("itemCategory").parquet(path)

对于这个数据帧，当我读回数据时，它的字符串类型将为itemCategory.

For this dataframe, when I read the data back, it will have String the data type for itemCategory.

但是有时候，我有来自其他租户的数据帧，如下所示.

However at times, I have dataframe from other tenants as below.

itemName, itemCategory
Name1, 0
Name2, 1
Name3, 0

在这种情况下，写入分区后，回读时，结果数据帧将具有itemCategory数据类型的Int.

In this case, after being written as partition, when read back, the resulting dataframe will have Int for the data type of itemCategory.

Parquet文件具有描述数据类型的元数据.如何为分区指定数据类型，以便将其读为String而不是Int?

Parquet file has the metadata that describe the data type. How can I specify the data type for the partition so it will be read back as String instead of Int?

推荐答案

如果将"spark.sql.sources.partitionColumnTypeInference.enabled"设置为"false"，spark将推断所有分区列为字符串.

If you set "spark.sql.sources.partitionColumnTypeInference.enabled" to "false", spark will infer all partition columns as Strings.

在spark 2.0或更高版本中，您可以这样设置:

In spark 2.0 or greater, you can set like this:

spark.conf.set("spark.sql.sources.partitionColumnTypeInference.enabled", "false")

在1.6中，像这样:

sqlContext.setConf("spark.sql.sources.partitionColumnTypeInference.enabled", "false")

缺点是每次读取数据时都必须这样做，但至少可以正常工作.

The downside is you have to do this each time you read the data, but at least it works.

这篇关于从Spark写入时避免丢失分区数据的数据类型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从Spark写入时避免丢失分区数据的数据类型 [英] Avoid losing data type for the partitioned data when writing from Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从Spark写入时避免丢失分区数据的数据类型 [英] Avoid losing data type for the partitioned data when writing from Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭