Scala:有什么理由更喜欢“filter+map"而不是“collect"? [英] Scala: Can there be any reason to prefer `filter+map` over `collect`?

查看:39
本文介绍了Scala:有什么理由更喜欢“filter+map"而不是“collect"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有什么理由更喜欢filter+map:

list.filter (i => aCondition(i)).map(i => fun(i))

超过collect?:

list.collect(case i if aCondition(i) => fun(i))

带有 collect(单一外观)的那个对我来说看起来更快更干净.所以我总是去collect.

The one with collect (single look) looks faster and cleaner to me. So I would always go for collect.

推荐答案

Scala 的大多数集合都急切地应用操作,并且(除非您使用为您执行此操作的宏库)不会融合操作.所以 filter 后跟 map 通常会创建两个集合(即使你使用 Iterator 或类似的东西,中间形式也会被暂时创建,尽管一次只有一个元素),而 collect 不会.

Most of Scala's collections eagerly apply operations and (unless you're using a macro library that does this for you) will not fuse operations. So filter followed by map will usually create two collections (and even if you use Iterator or somesuch, the intermediate form will be transiently created, albeit only an element at a time), whereas collect will not.

另一方面,collect使用偏函数来实现联合测试,偏函数比谓词(A => Boolean)在测试是否集合中有东西.

On the other hand, collect uses a partial function to implement the joint test, and partial functions are slower than predicates (A => Boolean) at testing whether something is in the collection.

此外,在某些情况下,阅读一个比另一个更清晰,您不关心性能或内存使用差异的 2 倍左右.在这种情况下,使用更清楚的那个.一般来说,如果你已经有了命名的函数,读起来会更清楚

Additionally, there can be cases where it is simply clearer to read one than the other and you don't care about performance or memory usage differences of a factor of 2 or so. In that case, use whichever is clearer. Generally if you already have the functions named, it's clearer to read

xs.filter(p).map(f)
xs.collect{ case x if p(x) => f(x) }

但是如果您提供内联闭包,collect 通常看起来更干净

but if you are supplying the closures inline, collect generally looks cleaner

xs.filter(x < foo(x, x)).map(x => bar(x, x))
xs.collect{ case x if foo(x, x) => bar(x, x) }

即使它不一定更短,因为您只引用一次变量.

even though it's not necessarily shorter, because you only refer to the variable once.

现在,性能差异有多大?这会有所不同,但如果我们考虑这样的集合:

Now, how big is the difference in performance? That varies, but if we consider a a collection like this:

val v = Vector.tabulate(10000)(i => ((i%100).toString, (i%7).toString))

并且您想根据过滤第一个条目来挑选第二个条目(因此过滤器和映射操作都非常简单),然后我们得到下表.

and you want to pick out the second entry based on filtering the first (so the filter and map operations are both really easy), then we get the following table.

注意:可以将惰性视图放入集合并在那里收集操作.你并不总能得到你的原始类型,但你总是可以使用 to 获得正确的集合类型.所以 xs.view.filter(p).map(f).toVector 由于视图的原因,不会创建中间体.这也在下面进行了测试.也有人建议可以 xs.flatMap(x => if (p(x)) Some(f(x)) else None) 并且这是 高效em>.事实并非如此.它也在下面进行了测试.并且可以通过显式创建构建器来避免偏函数: val vb = Vector.newBuilder[String];xs.foreach(x => if (p(x)) vb += f(x));vb.result,其结果也在下面列出.

Note: one can get lazy views into collections and gather operations there. You don't always get your original type back, but you can always use to get the right collection type. So xs.view.filter(p).map(f).toVector would, because of the view, not create an intermediate. That is tested below also. It has also been suggested that one can xs.flatMap(x => if (p(x)) Some(f(x)) else None) and that this is efficient. That is not so. It's also tested below. And one can avoid the partial function by explicitly creating a builder: val vb = Vector.newBuilder[String]; xs.foreach(x => if (p(x)) vb += f(x)); vb.result, and the results for that are also listed below.

在下表中,测试了三个条件:不过滤任何内容,过滤一半,过滤所有内容.时间已标准化为过滤器/映射(100% = 与过滤器/映射相同的时间,越低越好).误差范围约为 +- 3%.

In the table below, three conditions have been tested: filter out nothing, filter out half, filter out everything. The times have been normalized to filter/map (100% = same time as filter/map, lower is better). Error bounds are around +- 3%.

不同过滤器/地图替代品的性能

====================== Vector ========================
filter/map   collect  view filt/map  flatMap   builder
   100%        44%          64%        440%      30%    filter out none
   100%        60%          76%        605%      42%    filter out half
   100%       112%         103%       1300%      74%    filter out all

因此,filter/mapcollect 通常非常接近(collect 在你保留很多时获胜),flatMap 在所有情况下都慢得多,而创建构建器总是获胜.(对于Vector来说确实如此.其他集合可能有一些不同的特征,但大多数集合的趋势会相似,因为操作上的差异是相似的.)<中的视图em>这个测试往往是一个胜利,但它们并不总是无缝地工作(除了空情况外,它们并不比collect更好).

Thus, filter/map and collect are generally pretty close (with collect winning when you keep a lot), flatMap is far slower under all situations, and creating a builder always wins. (This is true specifically for Vector. Other collections may have somewhat different characteristics, but the trends for most will be similar because the differences in operations are similar.) Views in this test tend to be a win, but they don't always work seamlessly (and they aren't really better than collect except for the empty case).

所以,最重要的是:如果在速度无关紧要时它有助于清晰,则更喜欢 filter 而不是 map,或者在您过滤掉几乎所有内容时更喜欢它的速度但仍然希望保持功能正常(所以不想使用构建器);否则使用 collect.

So, bottom line: prefer filter then map if it aids clarity when speed doesn't matter, or prefer it for speed when you're filtering out almost everything but still want to keep things functional (so don't want to use a builder); and otherwise use collect.

这篇关于Scala:有什么理由更喜欢“filter+map"而不是“collect"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆