为什么 filter(_:) 的谓词在惰性求值时会被调用这么多次? [英] Why does filter(_:)’s predicate get called so many times when evaluating it lazily?

查看:27
本文介绍了为什么 filter(_:) 的谓词在惰性求值时会被调用这么多次?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到了一个答案这个问题,在它的第一次修订中,有类似这样的代码:

I saw an answer to this question, which, in it’s first revision, had code similar to this:

let numbers = Array(0 ..< 50)

let result = numbers.lazy
    .filter {
        // gets called 2-3x per element in the range (0...15)!
        print("Calling filter for: \($0)")
        return $0 % 3 == 0
    }
    .prefix(5)

print(Array(result)) // [0, 3, 6, 9, 12]

通过使用惰性过滤器集合,它能够过滤满足给定谓词(在这种情况下,可以被 3 整除)的 numbers 的前 5 个元素,而无需评估 numbers 数组中的每个元素.

Which, through the use of a lazy filter collection, is able to filter the first 5 elements of numbers that satisfy a given predicate (in this case, being divisible by 3), without having to evaluate every element in the numbers array.

然而,答案随后指出 filter(_:) 的谓词可以为每个元素调用多次(1...15 范围内的元素调用 3 次,0 调用两次事实证明).

However, the answer then remarked that filter(_:)’s predicate could be called multiple times per element (3 times for elements in the range 1...15, and twice for 0 as it turns out).

这个过滤器的惰性求值效率低下的原因是什么?有什么办法可以避免对同一个元素多次求值?

What’s the reason for this inefficiency in the lazy evaluation of this filter? Is there any way to avoid the evaluation of the same element multiple times?

推荐答案

问题

这里的第一个罪魁祸首是通过使用 prefix(_:) 对惰性过滤器集合进行切片——在这种情况下,它返回一个 LazyFilterBidirectionalCollection 的 BidirectionalSlice.

The Problems

The first culprit here is the slicing of the lazy filter collection through the use of prefix(_:) – which, in this case, returns a BidirectionalSlice of the LazyFilterBidirectionalCollection.

一般来说,Collection 的切片需要存储基本集合,以及对切片查看"有效的索引范围.因此,为了创建 LazyFilterBidirectionalCollection 的切片以查看前 5 个元素,存储的索引范围必须是 startIndex ..<;indexAfterTheFifthElement.

In general, the slicing of a Collection entails the storing of the base collection, along with the range of indices that are valid for the slice to ‘view’. Therefore, in order to create a slice of a LazyFilterBidirectionalCollection to view the first 5 elements, the range of indices stored must be startIndex ..< indexAfterTheFifthElement.

为了获得 indexAfterTheFifthElementLazyFilterBidirectionalCollection 必须遍历基本集合 (numbers) 以找到 6th 满足谓词的元素(可以看到 此处是索引的确切实现).

In order to get indexAfterTheFifthElement, the LazyFilterBidirectionalCollection must iterate through the base collection (numbers) in order to find the 6th element that meets the predicate (you can see the exact implementation of the indexing here).

因此,需要根据谓词检查上述示例中 0...15 范围内的所有元素,以便创建惰性过滤器集合的切片.

Therefore all the elements in the range 0...15 from the above example need to be checked against the predicate simply in order to create a slice of the lazy filter collection.

第二个罪魁祸首是 Arrayinit(_:),它接受一个 Sequence 与数组的元素类型相同的元素元素 类型.这个初始化器的实现对序列调用 _copyToContiguousArray(),对于大多数序列,将调用转发到这个函数:

The second culprit is Array’s init(_:), which accepts a Sequence of elements of the same type as the array’s Element type. The implementation of this initialiser calls _copyToContiguousArray() on the sequence, which, for most sequences, forwards the call to this function:

internal func _copySequenceToContiguousArray<S : Sequence>
                                    (_ source: S) -> ContiguousArray<S.Iterator.Element> {

  let initialCapacity = source.underestimatedCount // <- problem here

  var builder =
    _UnsafePartiallyInitializedContiguousArrayBuffer<S.Iterator.Element>(
      initialCapacity: initialCapacity)

  var iterator = source.makeIterator()

  // FIXME(performance): use _copyContents(initializing:).
  // Add elements up to the initial capacity without checking for regrowth.
  for _ in 0..<initialCapacity {
    builder.addWithExistingCapacity(iterator.next()!)
  }

  // Add remaining elements, if any.
  while let element = iterator.next() {
    builder.add(element)
  }

  return builder.finish()
}

这里的问题是underestimatedCount.对于普通序列,它只有一个返回 0 的默认实现——然而,对于集合,它有一个默认实现,它获取集合的 count(我进入这个 这里有更详细的信息).

The problem here is underestimatedCount. For plain sequences, this just has a default implementation that returns 0 – however, for collections, it has a default implementation which gets the count of the collection (I go into this in more detail here).

Collection(BidirectionalSlice 将在此处使用)的 count 的默认实现很简单:

The default implementation for the count of a Collection (which BidirectionalSlice will use here) is simply:

public var count: IndexDistance {
    return distance(from: startIndex, to: endIndex)
} 

对于我们的切片,它将遍历索引直到 indexAfterTheFifthElement,从而再次重新评估 0...15 范围内的元素.

Which, for our slice, will walk through the indices up to indexAfterTheFifthElement, thus re-evaluating the elements in the range 0...15, again.

最后,创建切片的迭代器,并进行迭代,将序列中的每个元素添加到新数组的缓冲区中.对于 BidirectionalSlice,这将使用 IndexingIterator,它只是通过推进索引并输出每个索引的元素来工作.

Finally, an iterator of the slice is made, and is iterated through, adding each element in the sequence to the new array’s buffer. For a BidirectionalSlice, this will use an IndexingIterator, which simply works by advancing indices and outputting the element for each index.

为什么this遍历索引不会重新评估元素直到结果的第一个元素(注意在问题示例中,0 被评估一次)是由于事实上,它不直接访问 LazyFilterBidirectionalCollectionstartIndex,其中 必须评估所有元素直到结果中的第一个元素.相反,迭代器可以从切片本身的起始索引开始工作.

The reason why this walk over the indices doesn’t re-evaluate the elements up to the first element of the result (note in the question’s example, 0 is evaluated one time less) is due to the fact that it doesn’t directly access the startIndex of the LazyFilterBidirectionalCollection, which has to evaluate all elements up to the first element in the result. Instead, the iterator can work from the start index of the slice itself.

简单的解决方案是避免对延迟过滤器集合进行切片以获得其前缀,而是延迟应用前缀.

The simple solution is to avoid slicing the lazy filter collection in order to get its prefix, but instead apply the prefixing lazily.

prefix(_:) 实际上有两种实现.一种是Collection提供,并返回SubSequence(这是大多数标准库集合的某种形式的切片).

There are actually two implementations of prefix(_:). One is provided by Collection, and returns a SubSequence (which is some form of slice for most standard library collections).

另一个是Sequence提供,返回一个 AnySequence——它在底层使用了一个基本序列 _PrefixSequence,它只需要一个迭代器并允许迭代它直到迭代了给定数量的元素——然后就停止返回元素.

The other is provided by Sequence, that returns an AnySequence – which, under the hood uses a base sequence of _PrefixSequence, which simply takes an iterator and allows iteration through it until a given number of elements have been iterated over – then just stops returning elements.

对于惰性集合,prefix(_:) 的这种实现很棒,因为它不需要任何索引——它只是懒惰地应用前缀.

For lazy collections, this implementation of prefix(_:) is great, as it doesn’t require any indexing – it just lazily applies the prefixing.

因此,如果您说:

let result : AnySequence = numbers.lazy
    .filter {
        // gets called 1x per element :)
        print("Calling filter for: \($0)")
        return $0 % 3 == 0
    }
    .prefix(5)

numbers 的元素(直到第 5 次匹配)在传递给 Array<时只会被 filter(_:) 的谓词评估一次/code> 的初始化程序,因为您强制 Swift 使用 Sequenceprefix(_:) 的默认实现.

The elements of numbers (up to the 5th match) will only be evaluated once by filter(_:)’s predicate when passed to Array's initialiser, as you’re forcing Swift to use Sequence’s default implementation of prefix(_:).

在给定的惰性过滤器集合上防止所有索引操作的万无一失的方法是简单地使用惰性过滤器序列代替——这可以通过包装集合来完成您希望在 AnySequence 中执行惰性操作:

The foolproof way of preventing all indexing operations on a given lazy filter collection is to simply use a lazy filter sequence instead – this can be done by just wrapping the collection you wish to perform lazy operations on in an AnySequence:

let numbers = Array(0 ..< 50)

let result = AnySequence(numbers).lazy
    .filter {
        // gets called 1x per element :)
        print("Calling filter for: \($0)")
        return $0 % 3 == 0
    }
    .dropFirst(5) // neither of these will do indexing,
    .prefix(5)    // but instead return a lazily evaluated AnySequence.


print(Array(result)) // [15, 18, 21, 24, 27]

但是请注意,对于双向集合,这可能会对集合的 end 上的操作产生不利影响 - 因为整个序列将必须迭代才能到达终点.

However note that for bidirectional collections, this may have an adverse effect for operations on the end of the collection – as then the entire sequence will have to be iterated through in order to reach the end.

对于诸如 suffix(_:)dropLast(_:) 之类的操作,在序列上使用惰性集合可能更有效(在至少对于小的输入),因为它们可以简单地从集合的末尾开始索引.

For such operations as suffix(_:) and dropLast(_:), it may well be more efficient to work with a lazy collection over a sequence (at least for small inputs), as they can simply index from the end of the collection.

尽管与所有与性能相关的问题一样,您应该首先检查这是否是一个问题,然后再运行您自己的测试,看看哪种方法更适合您的实现.

Although, as with all performance related concerns, you should first check to see if this is even a problem in the first place, and then secondly run your own tests to see which method is better for your implementation.

所以,毕竟 - 在这里学到的教训是,你应该警惕这样一个事实,即切片懒惰过滤器集合可以重新评估基本集合的每个元素,直到切片可以'查看'.

So, after all of that – the lesson to learn here is that you should be wary of the fact that slicing a lazy filter collection can re-evaluate every element of the base collection up to the end index which the slice can ‘view’.

通常更可取的是将惰性过滤器集合视为序列,因为它不能被索引,因此意味着惰性操作无法评估任何元素(这样做会冒着破坏性迭代它们的风险),直到一个急切的操作发生了.

Often it’s more desirable to treat a lazy filter collection as a sequence instead, which cannot be indexed, therefore meaning that lazy operations cannot evaluate any elements (doing so would risk destructively iterating over them) until an eager operation happens.

但是,您应该警惕这样一个事实,即您可能会牺牲从末尾索引集合的能力,这对于诸如 suffix(_:) 之类的操作很重要.

However, you should then be wary of the fact that you’re potentially sacrificing being able to index the collection from the end, which is important for operations such as suffix(_:).

最后,值得注意的是,对于诸如 LazyMapCollection 之类的惰性视图来说,这不是问题,因为它们的元素不依赖于先前元素的结果"——因此它们可以被索引如果它们的基本集合是 RandomAccessCollection,则在恒定时间内.

Finally, it’s worth noting that this isn't a problem with lazy views such as LazyMapCollection, as their elements don’t depend on the ‘results’ of previous elements – therefore they can be indexed in constant time if their base collection is a RandomAccessCollection.

这篇关于为什么 filter(_:) 的谓词在惰性求值时会被调用这么多次?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆