如何将按功能分组转换为数据框 [英] How to convert the group by function to data frame
问题描述
我是scala和spark的新手.我正在尝试通过Spark SQL进行分组.当我尝试保存或查看输出时,会引发以下错误.
Hi I am new to scala and spark. I am trying group by through spark sql. When I am trying to save or to view the output.It throws following error.
value coalesce is not a member of org.apache.spark.sql.RelationalGroupedDataset
这是我的代码.
val fp = filtertable.select($"_1", $"_2", $"_3",$"_4").groupBy("_1", "_2","_3")
fp.show() // throws error
fp.coalesce(1).write.format("csv").save("file://" + test.toString()) //throws error.
任何帮助将不胜感激.
推荐答案
该问题表明您希望将分组数据以csv格式写入文本文件中.如果我的分析是正确的,那么rdd
上的 groupBy
应该是您想要的解决方案 ,因为dataframe
上的groupBy
需要遵循aggregation
>.因此,您必须将dataframe
转换为rdd
,应用groupBy
,最后将输出作为
The question suggests that you want to write the grouped data in a text file in a csv format. If my analysis is correct, then groupBy
on rdd
should be the solution you desire as groupBy
on a dataframe
would need aggregation
to be followed. So you will have to convert the dataframe
to rdd
, apply groupBy
and finally write the output to the csv
file as
val fp = df.select($"_1", $"_2", $"_3",$"_4")
.rdd
.groupBy(row => (row(0), row(1), row(2))) // similar to groupBy("_1", "_2","_3") on dataframe
.flatMap(kv => kv._2) // taking the grouped data
.map(_.mkString(",")) // making data in csv format
fp.coalesce(1).saveAsTextFile("file://" + test.toString())
我希望答案会有所帮助
这篇关于如何将按功能分组转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!