最好将项目添加到集合中,或将最终列表转换成集合? [英] Better to add item to a set, or convert final list to a set?
问题描述
我有一些看起来像这样的数据:
I have some data that looks something like this:
ID1 ID2 ID3
ID1 ID4 ID5
ID3 ID5 ID7 ID6
...
...
其中每一行都是一组.
我的目标是为每个ID都有一个字典,然后是一组与之共享> = 1组的其他ID.
My goal is to have a dictionary for each ID, followed by a set of the other IDs that share >= 1 group with it.
例如,此数据将返回{ID1:[ID2,ID3,ID4,ID5],ID2:[ID1,ID3] ...}
For example, this data would return {ID1: [ID2, ID3, ID4, ID5], ID2:[ID1, ID3] ... }
我可以想到3种选择,我想知道哪种(通常)是最好的:
I can think of 3 options for this, and I'm wondering which is (generally) best:
- 添加ID之前,请检查列表中是否已存在ID
- 创建集合而不是列表,并将每个ID添加到集合
- 将所有ID添加到列表中,然后将所有列表转换为末尾的集.
推荐答案
TL; DR:与选项2一起使用.只需从一开始就使用集合.
TL;DR: Go with option 2. Just use sets from the start.
在Python中,集合是哈希集,而列表是动态数组.两者均为O(1)
插入,但列表是否为元素O(n)
以及集合是否为元素O(1)
是检查元素是否存在.
In Python, sets are hash-sets, and lists are dynamic arrays. Inserting is O(1)
for both, but checking if an element exists is O(n)
for the list and O(1)
for the set.
因此选项1立即退出.如果您要插入n
项,并且每次都需要检查列表,那么总体复杂度将变为O(n^2)
.
So option 1 is immediately out. If you are inserting n
items and need to check the list every time, then the overall complexity becomes O(n^2)
.
选项2和3总体上在O(n)
都是最佳的.在微型基准测试中,选项2可能会更快,因为您不需要在集合之间移动对象.实际上,请选择一种在您的特定情况下易于阅读和维护的选项.
Options 2 and 3 are both optimal at O(n)
overall. Option 2 might be faster in micro-benchnarks because you don't need to move objects between collections. In practice, choose the option that is easier to read and maintain in your specific circumstance.
这篇关于最好将项目添加到集合中,或将最终列表转换成集合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!