透视星火数据框 [英] Pivot Spark Dataframe

查看:149
本文介绍了透视星火数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我开始使用星火Dataframes,我需要能够转动数据来创建多个列了与多个行1列。还有就是内置了对在滚烫的功能,我相信大熊猫在python,但我不能找到新的Spark数据框任何东西。

我以为我可以写某种自定义函数,将做到这一点,但我什至不知道如何下手,特别是因为我与星火新手。我的人都知道如何做到这一点与内置的功能或如何在斯卡拉写东西的建议,这是极大的AP preciated。


解决方案

我写一个for循环动态创建一个SQL查询克服了这个。说我有:

  ID标签值
1美国50
1 UK 100
1罐125
2美国75
2 UK 150
2罐175

和我想要的:

  ID美国英国制罐
1 50 100 125
2 75 150 175

我可以创建我想转动,然后创建一个包含SQL查询,我需要一个字符串值的列表。

  VAL国家名单=(US,英国,能)
VAL NUMCOUNTRIES = countries.length - 1VAR的查询=SELECT *
为(ⅰ&下; - 0到NUMCOUNTRIES-1){
  查询+ =情况下,当标签=+国家(我)+那么看重别人0 END为+国家(我)+,
}
查询+ =情况下,当标签=+ countries.last +那么看重别人0 END为+ countries.last +从myTable的myDataFrame.registerTempTable(myTable的)
VAL myDF1 = sqlContext.sql(查询)

我可以建立类似的查询然后执行聚集。不是一个非常完美的解决方案,但它的工作原理,是灵活的任何数值列表,也可通过在当你code被称为参数。<​​/ P>

I am starting to use Spark Dataframes and I need to be able to pivot the data to create multiple columns out of 1 column with multiple rows. There is built in functionality for that in Scalding and I believe in Pandas in python, but I can't find anything for the new Spark Dataframe.

I assume I can write custom function of some sort that will do this but I'm not even sure how to start, especially since I am a novice with Spark. I anyone knows how to do this with built in functionality or suggestions for how to write something in Scala, it is greatly appreciated.

解决方案

I overcame this by writing a for loop to dynamically create a SQL query. Say i have:

id  tag  value
1   US    50
1   UK    100
1   Can   125
2   US    75
2   UK    150
2   Can   175

and I want:

id  US  UK   Can
1   50  100  125
2   75  150  175

I can create a list with the value I want to pivot and then create a string containing the sql query I need.

val countries = List("US", "UK", "Can")
val numCountries = countries.length - 1

var query = "select *, "
for (i <- 0 to numCountries-1) {
  query += """case when tag = """" + countries(i) + """" then value else 0 end as """ + countries(i) + ", "
}
query += """case when tag = """" + countries.last + """" then value else 0 end as """ + countries.last + " from myTable"

myDataFrame.registerTempTable("myTable")
val myDF1 = sqlContext.sql(query)

I can create similar query to then do the aggregation. Not a very elegant solution but it works and is flexible for any list of values, which can also be passed in as an argument when your code is called.

这篇关于透视星火数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆