Python 字典键.“在"复杂 [英] Python dictionary keys. "In" complexity

查看:26
本文介绍了Python 字典键.“在"复杂的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

快速提问,主要是满足我对这个话题的好奇心.

Quick question to mainly satisfy my curiosity on the topic.

我正在编写一些带有 SQlite 数据库后端的大型 Python 程序,将来会处理大量记录,因此我需要尽可能多地优化.

I am writing some large python programs with an SQlite database backend and will be dealing with a large number of records in the future, so I need to optimize as much as I can.

对于一些函数,我正在字典中搜索键.我一直在使用in"关键字进行原型设计,并计划稍后返回并优化这些搜索,因为我知道in"关键字通常是 O(n)(因为这只是转换为 python 迭代整个列表并进行比较每个元素).但是,由于 python dict 基本上只是一个哈希映射,python 解释器是否足够聪明来解释:

For a few functions, I am searching through keys in a dictionary. I have been using the "in" keyword for prototyping and was planning on going back and optimizing those searches later as I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element). But, as a python dict is basically just a hash map, is the python interpreter smart enough to interpret:

if(key in dict.keys()):
    ...code...

到:

if(dict[key] != None):
    ...code...

这基本上是相同的操作,但顶部是 O(n),底部是 O(1).

It is basically the same operation but the top would be O(n) and the bottom would be O(1).

在我的代码中使用底部版本对我来说很容易,但后来我只是好奇并想我会问.

It's easy for me to use the bottom version in my code, but then I was just curious and thought I would ask.

推荐答案

首先,key in d.keys() 保证给你与 key in d 相同的值code> 用于任何字典 d.

First, key in d.keys() is guaranteed to give you the same value as key in d for any dict d.

以及dict 上的in 操作,或者您从调用keys()dict_keys 对象> 在它上面(在 3.x 中),不是 O(N),它是 O(1).

And the in operation on a dict, or the dict_keys object you get back from calling keys() on it (in 3.x), is not O(N), it's O(1).

没有真正的优化"在进行;只是使用哈希是在哈希表上实现 __contains__ 的明显方法,就像它是实现 __getitem__ 的明显方法一样.

There's no real "optimization" going on; it's just that using the hash is the obvious way to implement __contains__ on a hash table, just as it's the obvious way to implement __getitem__.

你可能会问哪里有保证.

You may ask where this is guaranteed.

好吧,事实并非如此.映射类型定义了dict作为,基本上,collections 的哈希表实现.abc.Mapping.没有什么可以阻止某人创建映射的哈希表实现,但仍然提供 O(N) 搜索.但是做出如此糟糕的实现需要额外的工作,那么他们为什么要这样做呢?

Well, it's not. Mapping Types defines dict as, basically, a hash table implementation of collections.abc.Mapping. There's nothing stopping someone from creating a hash table implementation of a Mapping, but still providing O(N) searches. But it would be extra work to make such a bad implementation, so why would they?

如果你真的需要证明给自己看,你可以测试你关心的每一个实现(使用探查器,或者使用一些带有自定义 __hash____eq__ 的类型> 记录调用,或者……),或者阅读源代码.

If you really need to prove it to yourself, you can test every implementation you care about (with a profiler, or by using some type with a custom __hash__ and __eq__ that logs calls, or…), or read the source.

在 2.x 中,您不想调用 keys,因为这会生成键的 list,而不是 KeysView.您可以使用 iterkeys,但这可能会生成一个迭代器或其他不是 O(1) 的东西.因此,只需将 dict 本身用作序列即可.

In 2.x, you do not want to call keys, because that generates a list of the keys, instead of a KeysView. You could use iterkeys, but that may generate an iterator or something else that's not O(1). So, just use the dict itself as a sequence.

即使在 3.x 中,您也不想调用 keys,因为没有必要.迭代一个 dict,检查它的 __contains__,并且通常把它当作一个序列总是等价于对它的键做同样的事情,所以何必呢?(当然,构建微不足道的 KeyView 并通过它访问,会增加几纳秒的运行时间,并增加程序的几次按键时间.)

Even in 3.x, you don't want to call keys, because there's no need to. Iterating a dict, checking its __contains__, and in general treating it like a sequence is always equivalent to doing the same thing to its keys, so why bother? (And of course building the trivial KeyView, and accessing through it, are going to add a few nanoseconds to your running time and a few keystrokes to your program.)

(不太清楚,对于d.keys()/d.iterkeys()d 来说,使用序列操作是等效的2.x. 除了性能问题,它们在每个 CPython、Jython、IronPython 和 PyPy 实现中都是等价的,但似乎没有像 3.x 中那样在任何地方说明.没关系;只需使用 key in d.)

(It's not quite clear that using sequence operations is equivalent for d.keys()/d.iterkeys() and d in 2.x. Other than performance issues, they are equivalent in every CPython, Jython, IronPython, and PyPy implementation, but it doesn't seem to be stated anywhere the way it is in 3.x. And it doesn't matter; just use key in d.)

在我们进行时,请注意:

While we're at it, note that this:

if(dict[key] != None):

... 行不通.如果 key 不在 dict 中,这将引发 KeyError,而不是返回 None.

… is not going to work. If the key is not in the dict, this will raise KeyError, not return None.

另外,你不应该用==!=检查None;始终使用 is.

Also, you should never check None with == or !=; always use is.

你可以用 try 来做到这一点——或者,更简单的说,如果 dict.get(key, None) 不是 None,你可以做 .但同样,没有理由这样做.此外,这不会处理 None 是一个完全有效的项目的情况.如果是这种情况,您需要执行类似 sentinel = object(); 的操作.如果 dict.get(key, sentinel) 不是哨兵:.

You can do this with a try—or, more simply, do if dict.get(key, None) is not None. But again, there's no reason to do so. Also, that won't handle cases where None is a perfectly valid item. If that's the case, you need to do something like sentinel = object(); if dict.get(key, sentinel) is not sentinel:.

所以,正确的写法是:

if key in d:

<小时>

更一般地说,事实并非如此:


More generally, this is not true:

我知道in"关键字通常是 O(n)(因为这只是转换为 python 迭代整个列表并比较每个元素

I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element

in 运算符,与大多数其他运算符一样,只是对 __contains__ 方法(或 C/Java/.NET/RPython 内置函数的等效方法)的调用.list 通过迭代列表和比较每个元素来实现它;dict 通过对值进行散列并查找散列来实现它;blist.blist 通过遍历一个B+Tree来实现;等等.所以,它可能是 O(n)、O(1)、O(log n) 或完全不同的东西.

The in operator, like most other operators, is just a call to a __contains__ method (or the equivalent for a C/Java/.NET/RPython builtin). list implements it by iterating the list and comparing each element; dict implements it by hashing the value and looking up the hash; blist.blist implements it by walking a B+Tree; etc. So, it could be O(n), O(1), O(log n), or something completely different.

这篇关于Python 字典键.“在"复杂的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆