Pyspark RDD:查找元素的索引 [英] Pyspark RDD: find index of an element

查看：55 发布时间：2021/6/24 20:34:39 python pyspark

本文介绍了Pyspark RDD:查找元素的索引的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 pyspark 的新手，我正在尝试将 python 中的列表转换为 rdd，然后我需要使用 rdd 查找元素索引.对于我正在做的第一部分:

I am new to pyspark and I am trying to convert a list in python to rdd and then I need to find elements index using the rdd. For the first part I am doing:

list = [[1,2],[1,4]]
rdd = sc.parallelize(list).cache()

所以现在 rdd 实际上是我的列表.问题是我想找到任何任意元素的索引，比如适用于 python 列表的索引"函数.我知道一个名为 zipWithIndex 的函数，它为每个元素分配索引，但我在 python 中找不到合适的例子(有 java 和 scala 的例子).

So now the rdd is actually my list. The thing is that I want to find index of any arbitrary element something like "index" function which works for python lists. I am aware of a function called zipWithIndex which assign index to each element but I could not find proper example in python (there are examples with java and scala).

谢谢.

推荐答案

使用 filter 和 zipWithIndex:

rdd.zipWithIndex().
filter(lambda (key,index) : key == [1,2]).
map(lambda (key,index) : index).collect()

请注意，这里的 [1,2] 可以很容易地更改为变量名，并且整个表达式可以包含在一个函数中.

Note that [1,2] here can be easily changed to a variable name and this whole expression can be wrapped within a function.

zipWithIndex 简单地返回 (item,index) 的元组，如下所示:

zipWithIndex simply returns a tuple of (item,index) like so:

rdd.zipWithIndex().collect()
> [([1, 2], 0), ([1, 4], 1)]

filter 仅查找匹配特定条件的那些(在这种情况下，key 等于特定的子列表):

filter finds only those that match a particular criterion (in this case, that key equals a specific sublist):

rdd.zipWithIndex().filter(lambda (key,index) : key == [1,2]).collect()
> [([1, 2], 0)]

map 相当明显，我们可以取回索引:

map is fairly obvious, we can just get back the index:

rdd.zipWithIndex().filter(lambda (key,index) : key == [1,2]).
map(lambda (key,index): index).collect()
> [0]

然后我们可以根据需要通过索引 [0] 来简单地获取第一个元素.

and then we can simply get the first element by indexing [0] if you want.

这篇关于Pyspark RDD:查找元素的索引的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark RDD:查找元素的索引 [英] Pyspark RDD: find index of an element

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark RDD:查找元素的索引 [英] Pyspark RDD: find index of an element

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭