RDD 中的 MapValues 和 Explode [英] MapValues and Explode in RDD
问题描述
我在下面有这个示例 RDD(下面称为 rdd
).数据集是一个(String, Int)
的元组:
I have this sample RDD below (called rdd
below). The dataset is a tuple of (String, Int)
:
(some | random | value, 10)
(some | random | value, 11)
(some | random | value, 12)
我想得到这个输出:
(some, 10)
(random, 10)
(value, 10)
(some, 11)
(random, 11)
(value, 11)
(some, 12)
(random, 12)
(value, 12)
我有这个 Scala 代码来尝试上述转换:
I have this Scala code to attempt the above transformation:
rdd.map(tuple => tuple._1.split("|").foreach(elemInArray => (elemInArray, tuple._2)))
在这段代码中,我遍历整个数据集并通过 |
分割元组的第一部分.然后我遍历 split
返回的数组中的每个元素,并使用每个 element
和我从 tuple._1
中得到的计数创建一个元组.
In this code I iterate through the entire dataset and split the first part of the tuple by |
. Then I iterate through each element in that array returned by split
and create a tuple with each element
and the count that I get form tuple._1
.
出于某种原因,我一直得到这个结果:
For some reason I keep getting this result:
()
()
()
()
()
()
()
()
()
有人知道这个问题吗?我似乎找不到哪里出错了.
Does anyone know the issue? I can't seem to find where I went wrong.
推荐答案
你实际上需要使用 flatMap
来做到这一点:
You actually need to use flatMap
for this:
val lt = List(("some | random | value", 10),
("some | random | value", 11),
("some | random | value", 12))
val convert: ((String, Int)) => List[(String, Int)] = tuple => tuple._1.split('|').map(str =>
(str, tuple._2)).toList
val t = lt.flatMap(convert)
正如我们所见,定义 convert
函数非常有用,因为我们可以通过向该函数传递单个元素来确保正确处理每个元素.然后我们可以将相同的函数传递给 flatMap
,它会将 convert
产生的结果列表聚合到一个列表中.
As we can see, defining the convert
function can be very useful, because we can ensure that each element is correctly handled by passing that function a single element. We can then pass that same function to flatMap
, which will aggregate the list of results that convert
produces into a single list.
以上产生:
t: List[(String, Int)] = List((some ,10),
( random ,10),
( value,10),
(some ,11),
( random ,11),
( value,11),
(some ,12),
( random ,12),
( value,12))
显然,我没有费心处理结果中的额外空白字符,但是通过使用 trim
更新您的 convert
函数很容易处理:>
Obviously, I didn't bother to deal with the extra whitespace characters in the result, but this is easily handled by updating your convert
function with trim
:
val convert: ((String, Int)) => List[(String, Int)] = tuple => tuple._1.split('|').map(str =>
(str.trim, tuple._2)).toList
这篇关于RDD 中的 MapValues 和 Explode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!