有效的方式来折叠scala中的列表,同时避免分配和变量 [英] Efficient way to fold list in scala, while avoiding allocations and vars

查看:114
本文介绍了有效的方式来折叠scala中的列表,同时避免分配和变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在列表中有很多项目,我需要分析内容以找出其中有多少项是完整的。我开始使用分区,但后来意识到我不需要返回两个列表,所以我切换到一个折叠:

I have a bunch of items in a list, and I need to analyze the content to find out how many of them are "complete". I started out with partition, but then realized that I didn't need to two lists back, so I switched to a fold:

val counts = groupRows.foldLeft( (0,0) )( (pair, row) => 
     if(row.time == 0) (pair._1+1,pair._2) 
     else (pair._1, pair._2+1)
   )

但我有许多并行用户需要经历很多行,并且导致很多GC活动(我的假设...... GC 可能来自其他事物,但我怀疑这一点因为我知道它会在每个折叠项上分配一个新的元组)。

but I have a lot of rows to go through for a lot of parallel users, and it is causing a lot of GC activity (assumption on my part...the GC could be from other things, but I suspect this since I understand it will allocate a new tuple on every item folded).

目前,我已将其重写为

for the time being, I've rewritten this as

var complete = 0
var incomplete = 0
list.foreach(row => if(row.time != 0) complete += 1 else incomplete += 1)

修正了GC,但引入了变数。

which fixes the GC, but introduces vars.

我想知道是否有一种方法可以在不使用变量的情况下执行此操作,同时也不会滥用GC?

I was wondering if there was a way of doing this without using vars while also not abusing the GC?

编辑:

一个var实现在大型列表上的速度似乎要快得多(比如40%),甚至比一个功能更强大但应该相当的尾递归优化版本更快。

Hard call on the answers I've received. A var implementation seems to be considerably faster on large lists (like by 40%) than even a tail-recursive optimized version that is more functional but should be equivalent.

dhg的第一个答案似乎与尾递归的性能相当,这意味着size pass是超级高效的......事实上,优化后的运行速度比我的尾递归的运行速度稍快硬件。

The first answer from dhg seems to be on-par with the performance of the tail-recursive one, implying that the size pass is super-efficient...in fact, when optimized it runs very slightly faster than the tail-recursive one on my hardware.

推荐答案

我看到您已经接受了一个答案,但您正确地提到该解决方案将遍历列表两次。

I see you've already accepted an answer, but you rightly mention that that solution will traverse the list twice. The way to do it efficiently is with recursion.

def counts(xs: List[...], complete: Int = 0, incomplete: Int = 0): (Int,Int) = 
  xs match {
    case Nil => (complete, incomplete)
    case row :: tail => 
      if (row.time == 0) counts(tail, complete + 1, incomplete)
      else               counts(tail, complete, incomplete + 1)
  }

这实际上只是一个定制的 fold ,除了我们使用2个累加器只是 Int s(原语)而不是元组(引用类型)。它也应该与使用变量的while循环一样高效 - 事实上,字节码应该是相同的。

This is effectively just a customized fold, except we use 2 accumulators which are just Ints (primitives) instead of tuples (reference types). It should also be just as efficient a while-loop with vars - in fact, the bytecode should be identical.

这篇关于有效的方式来折叠scala中的列表,同时避免分配和变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆