如何在Spark中的partitionby方法中传递多列 [英] How to pass multiple column in partitionby method in Spark

查看:754
本文介绍了如何在Spark中的partitionby方法中传递多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Spark的新手.我想将dataframe数据写入hive表.蜂巢表在多个列上分区.通过Hivemetastore客户端,我正在获取分区列,并将其作为变量传递给dataframe的write方法中的partitionby子句中.

I am a newbie in Spark.I want to write the dataframe data into hive table. Hive table is partitioned on mutliple column. Through, Hivemetastore client I am getting the partition column and passing that as a variable in partitionby clause in write method of dataframe.

var1="country","state" (Getting the partiton column names of hive table)
dataframe1.write.partitionBy(s"$var1").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

当我执行上面的代码时,它给我错误提示国家",状态"不存在. 我认为它以国家",州"为字符串.

When I am executing the above code,it is giving me error partiton "country","state" does not exists. I think it is taking "country","state" as a string.

能帮我吗?

推荐答案

partitionBy函数使用varargs而不是列表.您可以将其用作

The partitionBy function takes a varargs not a list. You can use this as

dataframe1.write.partitionBy("country","state").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

或者在scala中,您可以将列表转换为类似varargs的

Or in scala you can convert a list into a varargs like

val columns = Seq("country","state")
dataframe1.write.partitionBy(columns:_*).mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

这篇关于如何在Spark中的partitionby方法中传递多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆