如何将分组按功能转换为数据框 [英] How to convert the group by function to data frame
问题描述
我是 Scala 和 Spark 的新手.我正在尝试通过 spark sql 分组.当我尝试保存或查看输出时,它会引发以下错误.
Hi I am new to scala and spark. I am trying group by through spark sql. When I am trying to save or to view the output.It throws following error.
value coalesce is not a member of org.apache.spark.sql.RelationalGroupedDataset
这是我的代码.
val fp = filtertable.select($"_1", $"_2", $"_3",$"_4").groupBy("_1", "_2","_3")
fp.show() // throws error
fp.coalesce(1).write.format("csv").save("file://" + test.toString()) //throws error.
任何帮助将不胜感激.
推荐答案
问题表明您希望将分组数据以 csv 格式写入文本文件中.如果我的分析是正确的,那么 rdd
上的 groupBy
应该是你想要的解决方案 as groupBy
上dataframe
需要遵循 aggregation
.因此,您必须将 dataframe
转换为 rdd
,应用 groupBy
,最后将输出写入 csv
文件作为
The question suggests that you want to write the grouped data in a text file in a csv format. If my analysis is correct, then groupBy
on rdd
should be the solution you desire as groupBy
on a dataframe
would need aggregation
to be followed. So you will have to convert the dataframe
to rdd
, apply groupBy
and finally write the output to the csv
file as
val fp = df.select($"_1", $"_2", $"_3",$"_4")
.rdd
.groupBy(row => (row(0), row(1), row(2))) // similar to groupBy("_1", "_2","_3") on dataframe
.flatMap(kv => kv._2) // taking the grouped data
.map(_.mkString(",")) // making data in csv format
fp.coalesce(1).saveAsTextFile("file://" + test.toString())
希望回答对你有帮助
这篇关于如何将分组按功能转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!