防止 DataFrame.partitionBy() 从架构中删除分区列 [英] Prevent DataFrame.partitionBy() from removing partitioned columns from schema

查看：32 发布时间：2021/11/14 21:51:41 apache-spark spark-dataframe

本文介绍了防止 DataFrame.partitionBy() 从架构中删除分区列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我按如下方式对 DataFrame 进行分区:

I am partitioning a DataFrame as follows:

df.write.partitionBy("type", "category").parquet(config.outpath)

代码给出了预期的结果(即按类型和类别划分的数据).但是，类型"和类别"列已从数据/架构中删除.有没有办法防止这种行为?

The code gives the expected results (i.e. data partitioned by type & category). However, the "type" and "category" columns are removed from the data / schema. Is there a way to prevent this behaviour?

推荐答案

我能想到一种解决方法，虽然很蹩脚，但有效.

I can think of one workaround, which is rather lame, but works.

import spark.implicits._

val duplicated = df.withColumn("_type", $"type").withColumn("_category", $"category")
duplicated.write.partitionBy("_type", "_category").parquet(config.outpath)

我回答这个问题是希望有人能有比我更好的答案或解释(如果 OP 找到了更好的解决方案)，不过，因为我有同样的问题.

I'm answering this question in hopes that someone would have a better answer or explanation than what I have (if OP has found a better solution), though, since I have the same question.

这篇关于防止 DataFrame.partitionBy() 从架构中删除分区列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

防止 DataFrame.partitionBy() 从架构中删除分区列 [英] Prevent DataFrame.partitionBy() from removing partitioned columns from schema

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

防止 DataFrame.partitionBy() 从架构中删除分区列 [英] Prevent DataFrame.partitionBy() from removing partitioned columns from schema

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭