使用现有键创建值列表的组合-Pyspark [英] Creating combination of value list with existing key - Pyspark

查看:128
本文介绍了使用现有键创建值列表的组合-Pyspark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我的rdd包含如下数据:

So my rdd consists of data looking like:

(k, [v1,v2,v3...])

我想为值部分创建所有两组的组合.

I want to create a combination of all sets of two for the value part.

因此,最终地图应如下所示:

So the end map should look like:

(k1, (v1,v2))
(k1, (v1,v3))
(k1, (v2,v3))

我知道要获得价值,我会使用

I know to get the value part, I would use something like

rdd.cartesian(rdd).filter(case (a,b) => a < b)

但是,这要求传递整个rdd(对吗?),而不仅仅是价值部分.我不确定如何到达自己想要的终点,我怀疑它是一群人.

However, that requires the entire rdd to be passed (right?) not just the value part. I am unsure how to arrive at my desired end, I suspect its a groupby.

另外,最终,我想进入k,v像

Also, ultimately, I want to get to the k,v looking like

((k1,v1,v2),1)

我知道如何从寻找的东西中得到帮助,但是也许更容易直接去那里?

I know how to get from what I am looking for to that, but maybe its easier to go straight there?

谢谢.

推荐答案

我认为以色列的答案是不完整的,所以我走了一步.

I think Israel's answer is a incomplete, so I go a step further.

import itertools

a = sc.parallelize([
    (1, [1,2,3,4]),
    (2, [3,4,5,6]),
    (3, [-1,2,3,4])
  ])

def combinations(row):
  l = row[1]
  k = row[0]
  return [(k, v) for v in itertools.combinations(l, 2)]

a.map(combinations).flatMap(lambda x: x).take(3)
# [(1, (1, 2)), (1, (1, 3)), (1, (1, 4))]

这篇关于使用现有键创建值列表的组合-Pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆