使用现有键创建值列表的组合-Pyspark [英] Creating combination of value list with existing key - Pyspark

查看：128 发布时间：2021/4/8 19:44:44 python apache-spark mapreduce pyspark

本文介绍了使用现有键创建值列表的组合-Pyspark的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我的rdd包含如下数据:

So my rdd consists of data looking like:

(k, [v1,v2,v3...])

我想为值部分创建所有两组的组合.

I want to create a combination of all sets of two for the value part.

因此，最终地图应如下所示:

So the end map should look like:

(k1, (v1,v2))
(k1, (v1,v3))
(k1, (v2,v3))

我知道要获得价值，我会使用

I know to get the value part, I would use something like

rdd.cartesian(rdd).filter(case (a,b) => a < b)

但是，这要求传递整个rdd(对吗?)，而不仅仅是价值部分.我不确定如何到达自己想要的终点，我怀疑它是一群人.

However, that requires the entire rdd to be passed (right?) not just the value part. I am unsure how to arrive at my desired end, I suspect its a groupby.

另外，最终，我想进入k，v像

Also, ultimately, I want to get to the k,v looking like

((k1,v1,v2),1)

我知道如何从寻找的东西中得到帮助，但是也许更容易直接去那里?

I know how to get from what I am looking for to that, but maybe its easier to go straight there?

谢谢.

推荐答案

我认为以色列的答案是不完整的，所以我走了一步.

I think Israel's answer is a incomplete, so I go a step further.

import itertools

a = sc.parallelize([
    (1, [1,2,3,4]),
    (2, [3,4,5,6]),
    (3, [-1,2,3,4])
  ])

def combinations(row):
  l = row[1]
  k = row[0]
  return [(k, v) for v in itertools.combinations(l, 2)]

a.map(combinations).flatMap(lambda x: x).take(3)
# [(1, (1, 2)), (1, (1, 3)), (1, (1, 4))]

这篇关于使用现有键创建值列表的组合-Pyspark的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用现有键创建值列表的组合-Pyspark [英] Creating combination of value list with existing key - Pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用现有键创建值列表的组合-Pyspark [英] Creating combination of value list with existing key - Pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭