防止DataFrame.partitionBy()从架构中删除分区列 [英] Prevent DataFrame.partitionBy() from removing partitioned columns from schema

查看：366 发布时间：2020/9/4 2:56:30 apache-spark spark-dataframe

本文介绍了防止DataFrame.partitionBy()从架构中删除分区列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在按如下方式对DataFrame进行分区:

I am partitioning a DataFrame as follows:

df.write.partitionBy("type", "category").parquet(config.outpath)

该代码给出了预期的结果(即按类型&类别划分的数据).但是，类型"和类别"列已从数据/架构中删除.有办法防止这种行为吗?

The code gives the expected results (i.e. data partitioned by type & category). However, the "type" and "category" columns are removed from the data / schema. Is there a way to prevent this behaviour?

推荐答案

我可以想到一种解决方法，虽然相当la脚，但是可以解决问题.

I can think of one workaround, which is rather lame, but works.

import spark.implicits._

val duplicated = df.withColumn("_type", $"type").withColumn("_category", $"category")
duplicated.write.partitionBy("_type", "_category").parquet(config.outpath)

我正在回答这个问题，希望有人会比我有更好的答案或解释(如果OP找到了更好的解决方案)，因为我有相同的问题.

I'm answering this question in hopes that someone would have a better answer or explanation than what I have (if OP has found a better solution), though, since I have the same question.

这篇关于防止DataFrame.partitionBy()从架构中删除分区列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

防止DataFrame.partitionBy()从架构中删除分区列 [英] Prevent DataFrame.partitionBy() from removing partitioned columns from schema

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

防止DataFrame.partitionBy()从架构中删除分区列 [英] Prevent DataFrame.partitionBy() from removing partitioned columns from schema

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭