Swift巨大的数组字典,非常慢 [英] Swift enormous dictionary of arrays, very slow
问题描述
我正在使用dictionary
在Swift中进行项目.
I am working on a project in Swift, using a dictionary
.
此词典的类型为[String : [Posting]]
.我要在其中插入大约200k个不同的术语"(键),对于每个术语,我要在列表中附加大约500至1000个对象.我知道这是一种奇怪的做法,但是我没有选择,我必须处理所有这些要素.
This dictionary is of the type [String : [Posting]]
. I have around 200k different "terms" (keys) to insert in it, and for each term I have around 500 to 1000 objects to append in a list. I know it's a weird practice but I don't have the choice and I must deal with all those elements.
问题在于,随着字典的变大,这非常慢.我尝试切换到NSMutableDictionary
,没有运气.
The issue is that this is very very slow, as the dictionary gets bigger. I tried switching to a NSMutableDictionary
, no luck.
每次需要插入元素时,都会调用addTerm
函数:
My addTerm
function is called everytime I need to insert an element :
func addTerm(_ term: String, withId id: Int, atPosition position: Int) {
if self.map[term] == nil {
self.map[term] = [Posting]()
}
if self.map[term]!.last?.documentId == id {
self.map[term]!.last?.addPosition(position)
}
else {
self.map[term]!.append(Posting(withId: id, atPosition: position, forTerm: term))
}
}
EDIT :我意识到不是导致所有这种滞后的字典,而是实际上包含的数组.数组在添加新元素时会过多地重新分配,而我能做的最好的就是用ContiguousArray
替换它们.
EDIT: I realized that its not the dictionary that causes all this lag, but its actually the arrays it contains. Arrays re-allocate way too much when adding new elements, and the best I could was to replace them with ContiguousArray
.
推荐答案
这是相当普遍的性能陷阱,也可以在以下情况中看到:
This is fairly common performance trap, as also observed in:
- Dictionary in Swift with Mutable Array as value is performing very slow? How to optimize or construct properly?
- Swift semantics regarding dictionary access
问题源于以下事实:要在表达式self.map[term]!.append(...)
中进行突变的数组是字典存储中基础数组的临时可变副本.这意味着该数组永远不会被唯一引用,因此总是要重新分配其缓冲区.
The issue stems from the fact that the array you're mutating in the expression self.map[term]!.append(...)
is a temporary mutable copy of the underlying array in the dictionary's storage. This means that the array is never uniquely referenced and so always has its buffer re-allocated.
这种情况将在Swift 5中通过非官方引入通用访问器来解决,但是在此之前,一种解决方案(如上述Q& As中所述)是使用Dictionary
的subscript(_:default:)
,它来自Swift 4.1可以直接在存储中更改值.
This situation will fixed in Swift 5 with the unofficial introduction of generalised accessors, but until then, one solution (as mentioned in both the above Q&As) is to use Dictionary
's subscript(_:default:)
which from Swift 4.1 can mutate the value directly in storage.
尽管您的案例并不是应用单个突变的简单案例,所以您需要某种包装函数,以使您能够对可变数组进行范围化访问.
Although your case isn't quite a straightforward case of applying a single mutation, so you need some kind of wrapper function in order to allow you to have scoped access to your mutable array.
例如,它看起来像:
class X {
private var map: [String: [Posting]] = [:]
private func withPostings<R>(
forTerm term: String, mutations: (inout [Posting]) throws -> R
) rethrows -> R {
return try mutations(&map[term, default: []])
}
func addTerm(_ term: String, withId id: Int, atPosition position: Int) {
withPostings(forTerm: term) { postings in
if let posting = postings.last, posting.documentId == id {
posting.addPosition(position)
} else {
postings.append(Posting(withId: id, atPosition: position, forTerm: term))
}
}
}
// ...
}
这篇关于Swift巨大的数组字典,非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!