RDD 中的 MapValues 和 Explode [英] MapValues and Explode in RDD

查看:49
本文介绍了RDD 中的 MapValues 和 Explode的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在下面有这个示例 RDD(下面称为 rdd).数据集是一个(String, Int)的元组:

I have this sample RDD below (called rdd below). The dataset is a tuple of (String, Int):

(some | random | value, 10)
(some | random | value, 11)
(some | random | value, 12)

我想得到这个输出:

(some, 10)
(random, 10)
(value, 10)
(some, 11)
(random, 11)
(value, 11)
(some, 12)
(random, 12)
(value, 12)

我有这个 Scala 代码来尝试上述转换:

I have this Scala code to attempt the above transformation:

rdd.map(tuple => tuple._1.split("|").foreach(elemInArray => (elemInArray, tuple._2)))

在这段代码中,我遍历整个数据集并通过 | 分割元组的第一部分.然后我遍历 split 返回的数组中的每个元素,并使用每个 element 和我从 tuple._1 中得到的计数创建一个元组.

In this code I iterate through the entire dataset and split the first part of the tuple by |. Then I iterate through each element in that array returned by split and create a tuple with each element and the count that I get form tuple._1.

出于某种原因,我一直得到这个结果:

For some reason I keep getting this result:

()
()
()
()
()
()
()
()
()

有人知道这个问题吗?我似乎找不到哪里出错了.

Does anyone know the issue? I can't seem to find where I went wrong.

推荐答案

你实际上需要使用 flatMap 来做到这一点:

You actually need to use flatMap for this:

val lt = List(("some | random | value", 10),
              ("some | random | value", 11),
              ("some | random | value", 12))

val convert: ((String, Int)) => List[(String, Int)] = tuple => tuple._1.split('|').map(str =>
  (str, tuple._2)).toList

val t = lt.flatMap(convert)

正如我们所见,定义 convert 函数非常有用,因为我们可以通过向该函数传递单个元素来确保正确处理每个元素.然后我们可以将相同的函数传递给 flatMap,它会将 convert 产生的结果列表聚合到一个列表中.

As we can see, defining the convert function can be very useful, because we can ensure that each element is correctly handled by passing that function a single element. We can then pass that same function to flatMap, which will aggregate the list of results that convert produces into a single list.

以上产生:

t: List[(String, Int)] = List((some ,10), 
                              ( random ,10), 
                              ( value,10), 
                              (some ,11), 
                              ( random ,11), 
                              ( value,11), 
                              (some ,12), 
                              ( random ,12),
                              ( value,12))

显然,我没有费心处理结果中的额外空白字符,但是通过使用 trim 更新您的 convert 函数很容易处理:

Obviously, I didn't bother to deal with the extra whitespace characters in the result, but this is easily handled by updating your convert function with trim:

val convert: ((String, Int)) => List[(String, Int)] = tuple => tuple._1.split('|').map(str =>
  (str.trim, tuple._2)).toList

这篇关于RDD 中的 MapValues 和 Explode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆