如何将分组按功能转换为数据框 [英] How to convert the group by function to data frame

查看:25
本文介绍了如何将分组按功能转换为数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Scala 和 Spark 的新手.我正在尝试通过 spark sql 分组.当我尝试保存或查看输出时,它会引发以下错误.

Hi I am new to scala and spark. I am trying group by through spark sql. When I am trying to save or to view the output.It throws following error.

value coalesce is not a member of org.apache.spark.sql.RelationalGroupedDataset

这是我的代码.

 val fp = filtertable.select($"_1", $"_2", $"_3",$"_4").groupBy("_1", "_2","_3")
 fp.show() // throws error
 fp.coalesce(1).write.format("csv").save("file://" + test.toString()) //throws error.

任何帮助将不胜感激.

推荐答案

问题表明您希望将分组数据以 csv 格式写入文本文件中.如果我的分析是正确的,那么 rdd 上的 groupBy 应该是你想要的解决方案 as groupBydataframe 需要遵循 aggregation .因此,您必须将 dataframe 转换为 rdd,应用 groupBy,最后将输出写入 csv 文件作为

The question suggests that you want to write the grouped data in a text file in a csv format. If my analysis is correct, then groupBy on rdd should be the solution you desire as groupBy on a dataframe would need aggregation to be followed. So you will have to convert the dataframe to rdd, apply groupBy and finally write the output to the csv file as

val fp = df.select($"_1", $"_2", $"_3",$"_4")
      .rdd
      .groupBy(row => (row(0), row(1), row(2)))  // similar to groupBy("_1", "_2","_3") on dataframe
      .flatMap(kv => kv._2)   // taking the grouped data
      .map(_.mkString(","))   // making data in csv format

    fp.coalesce(1).saveAsTextFile("file://" + test.toString())

希望回答对你有帮助

这篇关于如何将分组按功能转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆