如何将按功能分组转换为数据框 [英] How to convert the group by function to data frame

查看:81
本文介绍了如何将按功能分组转换为数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是scala和spark的新手.我正在尝试通过Spark SQL进行分组.当我尝试保存或查看输出时,会引发以下错误.

Hi I am new to scala and spark. I am trying group by through spark sql. When I am trying to save or to view the output.It throws following error.

value coalesce is not a member of org.apache.spark.sql.RelationalGroupedDataset

这是我的代码.

 val fp = filtertable.select($"_1", $"_2", $"_3",$"_4").groupBy("_1", "_2","_3")
 fp.show() // throws error
 fp.coalesce(1).write.format("csv").save("file://" + test.toString()) //throws error.

任何帮助将不胜感激.

推荐答案

该问题表明您希望将分组数据以csv格式写入文本文件中.如果我的分析是正确的,那么rdd上的 groupBy应该是您想要的解决方案 ,因为dataframe上的groupBy需要遵循aggregation >.因此,您必须将dataframe转换为rdd,应用groupBy,最后将输出作为

The question suggests that you want to write the grouped data in a text file in a csv format. If my analysis is correct, then groupBy on rdd should be the solution you desire as groupBy on a dataframe would need aggregation to be followed. So you will have to convert the dataframe to rdd, apply groupBy and finally write the output to the csv file as

val fp = df.select($"_1", $"_2", $"_3",$"_4")
      .rdd
      .groupBy(row => (row(0), row(1), row(2)))  // similar to groupBy("_1", "_2","_3") on dataframe
      .flatMap(kv => kv._2)   // taking the grouped data
      .map(_.mkString(","))   // making data in csv format

    fp.coalesce(1).saveAsTextFile("file://" + test.toString())

我希望答案会有所帮助

这篇关于如何将按功能分组转换为数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆