如何在PySpark中有效地按价值排序? [英] How to sort by value efficiently in PySpark?

查看:107
本文介绍了如何在PySpark中有效地按价值排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按V(即值)对我的K,V元组进行排序.我知道TakeOrdered对此很有用,如果您知道需要多少:

b = sc.parallelize([('t',3),('b',4),('c',1)])

使用TakeOrdered:

b.takeOrdered(3,lambda atuple: atuple[1])

使用Lambda

b.map(lambda aTuple: (aTuple[1], aTuple[0])).sortByKey().map(
    lambda aTuple: (aTuple[0], aTuple[1])).collect()

我已经在此处签出了问题,这表明了后者.我发现很难相信takeOrdered如此简洁,但是它需要与Lambda解决方案相同的操作次数.

有人知道火花更简单,更简洁地按值排序吗?

解决方案

我认为sortBy()更简洁:

b = sc.parallelize([('t', 3),('b', 4),('c', 1)])
bSorted = b.sortBy(lambda a: a[1])
bSorted.collect()
...
[('c', 1),('t', 3),('b', 4)]

实际上是更多完全有效,因为它涉及按值进行键控,按键进行排序,然后获取值,但它比后面的解决方案漂亮.在效率方面,我认为您不会找到更有效的解决方案,因为您将需要一种方法来转换数据,使值成为您的键(然后最终将数据转换回原始模式)./p>

I want to sort my K,V tuples by V, i.e. by the value. I know that TakeOrdered is good for this if you know how many you need:

b = sc.parallelize([('t',3),('b',4),('c',1)])

Using TakeOrdered:

b.takeOrdered(3,lambda atuple: atuple[1])

Using Lambda

b.map(lambda aTuple: (aTuple[1], aTuple[0])).sortByKey().map(
    lambda aTuple: (aTuple[0], aTuple[1])).collect()

I've checked out the question here, which suggests the latter. I find it hard to believe that takeOrdered is so succinct and yet it requires the same number of operations as the Lambda solution.

Does anyone know of a simpler, more concise Transformation in spark to sort by value?

解决方案

I think sortBy() is more concise:

b = sc.parallelize([('t', 3),('b', 4),('c', 1)])
bSorted = b.sortBy(lambda a: a[1])
bSorted.collect()
...
[('c', 1),('t', 3),('b', 4)]

It's actually not more efficient at all as it involves keying by the values, sorting by the keys, and then grabbing the values but it looks prettier than your latter solution. In terms of efficiency, I don't think you'll find a more efficient solution as you would need a way to transform your data such that values will be your keys (and then eventually transform that data back to the original schema).

这篇关于如何在PySpark中有效地按价值排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆