spark-地图内过滤器 [英] spark - filter within map

查看：95 发布时间：2020/9/4 6:02:23 java apache-spark

本文介绍了spark-地图内过滤器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试过滤map函数内部.基本上，我将在经典map-reduce中做到的方式是，当满足过滤条件时，mapper不会在上下文中写入任何内容.如何使用Spark实现类似目的?我似乎无法从map函数返回null，因为它在shuffle步骤中失败了.我可以使用过滤器功能，但似乎不必要的数据集迭代，而我可以在地图执行相同的任务.我还可以尝试使用虚拟密钥输出null，但这是一个不好的解决方法.

I am trying to filter inside map function. Basically the way I'll do that in classic map-reduce is mapper wont write anything to context when filter criteria meet. How can I achieve similar with spark? I can't seem to return null from map function as it fails in shuffle step. I can either use filter function but it seems unnecessary iteration of data set while I can perform same task during map. I can also try to output null with dummy key but thats a bad workaround.

推荐答案

选项很少:

rdd.flatMap:rdd.flatMap将把Traversable集合展平到RDD中.要选择元素，通常会在转换后返回Option.

rdd.flatMap: rdd.flatMap will flatten a Traversable collection into the RDD. To pick elements, you'll typically return an Option as result of the transformation.

rdd.flatMap(elem => if (filter(elem)) Some(f(elem)) else None)

rdd.collect(pf: PartialFunction)允许您提供部分功能，该功能可以过滤和转换原始RDD中的元素.您可以使用此方法使用模式匹配的所有功能.

rdd.collect(pf: PartialFunction) allows you to provide a partial function that can filter and transform elements from the original RDD. You can use all power of pattern matching with this method.

rdd.collect{case t if (cond(t)) => f(t)}
rdd.collect{case t:GivenType => f(t)}

正如Dean Wampler在评论中提到的那样，rdd.map(f(_)).filter(cond(_))可能比上面提到的其他简洁"选项同样好，甚至更快.

As Dean Wampler mentions in the comments, rdd.map(f(_)).filter(cond(_)) might be as good and even faster than the other more 'terse' options mentioned above.

其中f是转换(或映射)功能.

Where f is a transformation (or map) function.

这篇关于spark-地图内过滤器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

spark-地图内过滤器 [英] spark - filter within map

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

spark-地图内过滤器 [英] spark - filter within map

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭