如何做到自定义操作上GroupedData火花？ [英] How to do custom operations on GroupedData in Spark?

查看：259 发布时间：2016/5/22 15:17:54 scala apache-spark grouping

本文介绍了如何做到自定义操作上GroupedData火花？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要重写我的一些code与RDDS使用DataFrames写的。这是工作非常顺利，直到我发现这一点：

I want to rewrite some of my code written with RDDs to use DataFrames. It was working quite smoothly until I found this:

 events
  .keyBy(row => (row.getServiceId + row.getClientCreateTimestamp + row.getClientId, row) )
  .reduceByKey((e1, e2) => if(e1.getClientSendTimestamp <= e2.getClientSendTimestamp) e1 else e2)
  .values

这是简单的开始

 events
  .groupBy(events("service_id"), events("client_create_timestamp"), events("client_id"))

但下一步是什么？如果我想要什么每个元素遍历当前组中？它甚至有可能？
提前致谢。

but what's next? What if I'd like to iterate over every element in the current group? Is it even possible? Thanks in advance.

推荐答案

GroupedData 不能直接使用。数据不是物理分组，它仅仅是一个逻辑操作。你必须申请例如 AGG 方法的一些变种：


GroupedData cannot be used directly. Data is not physically grouped and it is just a logical operation. You have to apply some variant of agg method for example:
events
 .groupBy($"service_id", $"client_create_timestamp", $"client_id")
 .min("client_send_timestamp")

或
events
 .groupBy($"service_id", $"client_create_timestamp", $"client_id")
 .agg(min($"client_send_timestamp"))

其中， client_send_timestamp 是要汇总列。
如果你想保留的信息汇总比刚加入或使用窗口函数 - 看的查找每组最大行Spark中数据帧 
If you want to keep information than aggregate just join or use Window functions - see Find maximum row per group in Spark DataFrame
星火还支持用户自定义聚合函数 - 见我如何定义和星火SQL使用用户定义的聚合功能 

Spark also supports User Defined Aggregate Functions - see How can I define and use a User-Defined Aggregate Function in Spark SQL?

                        这篇关于如何做到自定义操作上GroupedData火花？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何做到自定义操作上GroupedData火花？ [英] How to do custom operations on GroupedData in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何做到自定义操作上​​GroupedData火花？ [英] How to do custom operations on GroupedData in Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何做到自定义操作上GroupedData火花？ [英] How to do custom operations on GroupedData in Spark?

登录关闭