使用字典而不是列表的开销是多少? [英] What is the overhead of using a dictionary instead of a list?

查看:55
本文介绍了使用字典而不是列表的开销是多少?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的一个项目中遇到了一种情况,我可以使用列表或字典,但我很难选择使用哪一种.

我正在分析大量项目 (>400k).我将拥有(> 400k)列表或字典,我会经常使用它们.(获取/设置/更新)

在我的特定情况下,如果我根本不考虑性能,使用字典感觉比列表更方便.但是,我知道我可以使用列表来编写相同的内容.

我应该追求可读性并使用字典还是使用字典可能会增加过多的开销,从内存和时间的角度来看都会显着降低我的性能.

我知道这个问题有点过于宽泛了.但我想在完成这个决定后开始构建我的所有逻辑之前问它.

简而言之,我的情况:

我有键 0,1,...,n 的值.目前,键将始终是从 0n 的整数,我可以将其保存在列表中.

但是,我可以想到将来可能会出现的一些情况,我需要为不是整数的键保留一些项目.或不连续的整数.

所以,问题是如果首先使用字典而不是列表不会增加太多的内存/时间成本,我会首先使用字典.但是,我不确定拥有 >400k 的字典与拥有 >400k 的列表在性能方面有很大的不同.

解决方案

直接回答你的问题:字典比列表有更多的开销:

  1. 每个项目的键和值都消耗内存,而列表只消耗值.
  2. 添加或删除项目需要查阅哈希表.

尽管 Python 词典设计得非常好且速度惊人,但如果您有一种可以使用直接索引的算法,您将节省空间和时间.

但是,从你提问的声音和随后的讨论来看,听起来你的需求可能会随着时间的推移而变化,并且你有一些不确定性(但是,我可以想到将来可能出现的一些情况,我会需要为不是整数的键保留一些项目")

如果是这种情况,我建议您创建自己的混合数据结构,以便随着您的需求的发展,您可以在隔离的地方解决存储效率问题,同时允许您的应用程序使用简单、可读的代码来存储和检索对象.

例如,这里有一个名为 maybelist 的 Python3 类,它派生自一个列表,但会检测非数字键的存在,将异常存储在字典中,同时为一些常见的列表操作提供映射:

class maylist(list):def __init__(self, *args):super().__init__(*args)self._extras = dict()def __setitem__(self, index, val):尝试:super().__setitem__(index, val)返回除了类型错误:# 索引不是整数,存储在字典中self._extras[index] = val返回除了索引错误:经过距离 = 索引 - len(self)如果距离>0:# 如果需要,将 'None' 放在空槽中self.extend((None,) * distance)self.append(val)def __getitem__(self, index):尝试:返回 super().__getitem__(index)除了类型错误:返回 self._extras[index]def __str__(self):return str([item for item in self])def __len__(self):返回 super().__len__() + len(self._extras)def __iter__(self):对于 itertools.chain(super().__iter__(), self._extras) 中的项目:产量项目

所以,你可以把它当作一个数组,让它自动展开:

<预><代码>>>>x =也许列表()>>>x[0] = '第一个'>>>x[1] = '秒'>>>x[10] = '第十一个'>>>打印(x)['第一','第二',无,无,无,无,无,无,无,无,'第十一']>>>打印(x[10])第十一

或者您可以添加带有非数字键的项目(如果存在):

<预><代码>>>>x['意外'] = '别的东西'>>>打印(x ['意外'])别的东西

如果您使用迭代器或您选择的其他方法访问对象,则该对象似乎表现正常:

<预><代码>>>>打印(x)['第一','第二',无,无,无,无,无,无,无,无,'第十一','意外']>>>打印(len(x))12

这只是一个示例,您需要定制这样一个类以满足您的应用程序的需要.例如,结果对象的行为并不严格像列表(例如,x[len(x)-1] 不是最后一项).但是,您的应用程序可能不需要如此严格的遵守,如果您仔细且计划得当,您可以创建一个对象,该对象既提供高度优化的存储,又为未来不断发展的数据结构需求留出空间.

I have a situation in one of my projects that I can either use lists or dictionaries and I am having hard time picking which one to use.

I am analyzing large number of items (>400k). And I will have (>400k) list or dictionaries which I will use very frequently. (Get/Set/Update)

In my particular situation, using a dictionary feels like more convenient than list if I wouldn't think about performance at all. However, I know I can manage writing the same thing using lists.

Should I go for readibility and use dictionaries or going with dictionary may add too much of a overhead that will dramatically decrease my performance from the perspective of both memory and time.

I know this question is a bit too-broad. But I wanted to ask it before I start building all my logic after having this decision done.

My situation in a nutshell:

I have values for keys 0,1,...,n. For now, keys will be always integers from 0 to n which I can keep in a list.

However, I can think of some situations that might arise in future that I will need to keep some items for keys which are not integers. Or integers which are not consecutive.

So, the question is if using dictionaries instead of lists in the first place wouldn't add much of a memory/time cost, I will go with dictionaries in the first place. However, I am not sure having >400k dictionaries vs. having >400k lists make big of a difference in terms of performance.

解决方案

In direct answer to your question: dictionaries have significantly more overhead than lists:

  1. Each item consumes memory for both key and value, in contrast to only values for lists.
  2. Adding or removing an item requires consulting a hash table.

Despite the fact that Python dictionaries are extremely well-designed and surprisingly fast, if you have an algorithm that can use direct index, you will save space and time.

However, from the sound of your question and subsequent discusion, it sounds like your needs may change over time and you have some uncertainty ("However, I can think of some situations that might arise in future that I will need to keep some items for keys which are not integers")

If this is the case, I suggest creating a hybrid data structure of your own so that as your needs evolve you can address the efficiency of storage in an isolated place while allowing your application to use simple, readable code to store and retrieve objects.

For example, here is a Python3 class called maybelist that is derived from a list, but detects the presence of non-numeric keys, storing exceptions in a dictionary while providing mappings for some common list operations:

class maybelist(list):

    def __init__(self, *args):
        super().__init__(*args)
        self._extras = dict()

    def __setitem__(self, index, val):
        try:
            super().__setitem__(index, val)
            return
        except TypeError:
            # Index is not an integer, store in dict
            self._extras[index] = val
            return
        except IndexError:
            pass
        distance = index - len(self)
        if distance > 0:
            # Put 'None' in empty slots if need be
            self.extend((None,) * distance)
        self.append(val)

    def __getitem__(self, index):
        try:
            return super().__getitem__(index)
        except TypeError:
            return self._extras[index]

    def __str__(self):
        return str([item for item in self])

    def __len__(self):
        return super().__len__() + len(self._extras)

    def __iter__(self):
        for item in itertools.chain(super().__iter__(), self._extras):
            yield item

So, you could treat it like an array, and have it auto expand:

>>> x = maybelist()
>>> x[0] = 'first'
>>> x[1] = 'second'
>>> x[10] = 'eleventh'
>>> print(x)
['first', 'second', None, None, None, None, None, None, None, None, 'eleventh']
>>> print(x[10])
eleventh

Or you could add items with non-numeric keys if they were present:

>>> x['unexpected'] = 'something else'
>>> print(x['unexpected'])
something else

And yet have the object appear to behave properly if you access it using iterators or other methods of your choosing:

>>> print(x)
['first', 'second', None, None, None, None, None, None, None, None, 'eleventh', 'unexpected']
>>> print(len(x))
12

This is just an example, and you would need to tailor such a class to meet the needs of your application. For example, the resulting object does not strictly behave like a list (x[len(x)-1] is not the last item, for example). However, your application may not need such strict adherence, and if you are careful and plan properly, you can create an object which both provides highly optimized storage while leaving room for evolving data structure needs in the future.

这篇关于使用字典而不是列表的开销是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆