从(键,值),其中值是火花SQL命令的价值由星火pairRDD [英] Order by Value in Spark pairRDD from (Key,Value) where the value is from spark-sql
本文介绍了从(键,值),其中值是火花SQL命令的价值由星火pairRDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我创建了一个地图这样的 -
VAL B = a.map(X =>(X(0),X))
下面b是类型的
org.apache.spark.rdd.RDD [(任何,org.apache.spark.sql.Row)
- 如何每个按键内使用从值行的字段排序PairRDD?
- 后,我要运行的进程的所有值在previously的排序顺序孤立每个键的功能。那可能吗?如果是的话可以请你举一个例子。
- 有需要的分区对RDD?任何代价
解决方案
只回答你第一个问题:
VAL indexToSelect:INT =? //指向可排序类型(有订购或有序)
分类= rdd.sortBy(双= GT; pair._2(indexToSelect))
这做什么,它只是选择在对第二个值( pair._2
),并从该行是选择合适的值( (indexToSelect)
以上冗长:。适用(indexToSelect)
)
I have created a map like this -
val b = a.map(x => (x(0), x) )
Here b is of the type
org.apache.spark.rdd.RDD[(Any, org.apache.spark.sql.Row)]
- How can I sort the PairRDD within each key using a field from the value row?
- After that I want to run a function which processes all the values for each Key in isolation in the previously sorted order. Is that possible? If yes can you please give an example.
- Is there any consideration needed for Partitioning the Pair RDD?
解决方案
Answering only your first question:
val indexToSelect: Int = ??? //points to sortable type (has Ordering or is Ordered)
sorted = rdd.sortBy(pair => pair._2(indexToSelect))
What this does, it just selects the second value in the pair (pair._2
) and from that row it selects the appropriate value ((indexToSelect)
or more verbosely: .apply(indexToSelect)
).
这篇关于从(键,值),其中值是火花SQL命令的价值由星火pairRDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文