Python字典键。 "在"复杂 [英] Python dictionary keys. "In" complexity

查看:178
本文介绍了Python字典键。 "在"复杂的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

快速问题主要满足我对这个话题的好奇心。



我正在使用SQlite数据库后端编写一些大型的python程序,将来会处理大量的记录,所以我需要像我一样优化能够。



对于一些功能,我正在搜索字典中的键。我一直在使用in关键字进行原型设计,并计划稍后再进行优化,因为我知道in关键字通常是O(n)(因为这只是翻译成python遍历整个列表,并比较每个元素)。但是,由于python dict基本上只是一个哈希映射,所以python解释器是否足够聪明地解释:

  if(key in dict.keys()):
...代码...

to: / p>

  if(dict [key]!= None):
...代码...

它基本上是相同的操作,但顶部将是O(n),底部将是O(1)。



我的代码很容易使用底层版本,但后来我只是好奇,我会问。

解决方案

首先,d.keys()中的给予您与d 中的键值相同的值,用于任何dict d



dict dict_keys 对象,您从调用 keys()(3.x)中返回,是 O(N)这是O(1)。



没有真正的优化只是使用散列是在哈希表上实现 __ contains __ 的明显方法,就像实现 __ getitem __




您可能会问保证这一点。



嗯,不是。 映射类型定义 dict 基本上是一个哈希表实现 collections.abc.Mapping 。没有人阻止某人创建映射的哈希表实现,但仍提供O(N)搜索。但是,如果您真的需要向自己证明这一点,那么为什么会出现这样一个糟糕的实现呢,这将是额外的工作,那么为什么呢?



关于(使用概要分析器或通过使用某些类型与自定义 __哈希__ __ eq __ 记录呼叫或...) ,或阅读来源。






在2.x中,您不想调用,因为它生成一个列表的密钥,而不是一个 KeysView 。您可以使用 iterkeys ,但这可能会生成一个迭代器或其他不是O(1)的东西。所以,只需使用dict本身作为一个序列。



即使在3.x中,也不想调用,因为不需要。迭代 dict ,检查其 __包含__ ,一般来说,像序列一样处理它是总是相当于做同样的事情,它的钥匙,那么为什么麻烦? (当然,构建琐碎的 KeyView ,并通过它访问,将为您的运行时间添加几纳秒,并为您的程序提供一些击键。)



(使用顺序操作相当于 d.keys() / d不清楚除了性能问题以外,他们在每个CPython中都是等同的,所以它们在 Jython,IronPython和PyPy实现,但它似乎没有在3.x中的任何地方陈述。并不重要;只需在d中使用。)






当我们在它时,请注意:

  if(dict [key]!= None):

...不会工作。如果不在 dict 中,则会引发 KeyError ,不返回



此外,您不应该检查 == != ;总是使用



您可以使用尝试 - 或者更简单的说,如果dict.get(key,None)不是None ,请执行。但再一次,没有理由这样做。另外,这不会处理 None 是完全有效的项目的情况。如果是这样,你需要像 sentinel = object();如果dict.get(key,sentinel)不是sentinel:






所以,正确的事情是:

 如果键入d:






更一般来说,这不是真的:


我知道in关键字通常是O(n)(因为这只是翻译成python遍历整个列表并比较每个元素


运算符中的像大多数其他运算符一样只是调用一个 __包含__ 方法(或等价于C / Java / .NET / RPython内置函数)。 list 通过迭代列表并比较每个元素来实现它; dict 通过散列值来查找哈希来实现它; blist.blist 通过走B +树来实现它等等。因此,它可以是O(n),O(1),O(log n)或完全不同的东西。


Quick question to mainly satisfy my curiosity on the topic.

I am writing some large python programs with an SQlite database backend and will be dealing with a large number of records in the future, so I need to optimize as much as I can.

For a few functions, I am searching through keys in a dictionary. I have been using the "in" keyword for prototyping and was planning on going back and optimizing those searches later as I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element). But, as a python dict is basically just a hash map, is the python interpreter smart enough to interpret:

if(key in dict.keys()):
    ...code...

to:

if(dict[key] != None):
    ...code...

It is basically the same operation but the top would be O(n) and the bottom would be O(1).

It's easy for me to use the bottom version in my code, but then I was just curious and thought I would ask.

解决方案

First, key in d.keys() is guaranteed to give you the same value as key in d for any dict d.

And the in operation on a dict, or the dict_keys object you get back from calling keys() on it (in 3.x), is not O(N), it's O(1).

There's no real "optimization" going on; it's just that using the hash is the obvious way to implement __contains__ on a hash table, just as it's the obvious way to implement __getitem__.


You may ask where this is guaranteed.

Well, it's not. Mapping Types defines dict as, basically, a hash table implementation of collections.abc.Mapping. There's nothing stopping someone from creating a hash table implementation of a Mapping, but still providing O(N) searches. But it would be extra work to make such a bad implementation, so why would they?

If you really need to prove it to yourself, you can test every implementation you care about (with a profiler, or by using some type with a custom __hash__ and __eq__ that logs calls, or…), or read the source.


In 2.x, you do not want to call keys, because that generates a list of the keys, instead of a KeysView. You could use iterkeys, but that may generate an iterator or something else that's not O(1). So, just use the dict itself as a sequence.

Even in 3.x, you don't want to call keys, because there's no need to. Iterating a dict, checking its __contains__, and in general treating it like a sequence is always equivalent to doing the same thing to its keys, so why bother? (And of course building the trivial KeyView, and accessing through it, are going to add a few nanoseconds to your running time and a few keystrokes to your program.)

(It's not quite clear that using sequence operations is equivalent for d.keys()/d.iterkeys() and d in 2.x. Other than performance issues, they are equivalent in every CPython, Jython, IronPython, and PyPy implementation, but it doesn't seem to be stated anywhere the way it is in 3.x. And it doesn't matter; just use key in d.)


While we're at it, note that this:

if(dict[key] != None):

… is not going to work. If the key is not in the dict, this will raise KeyError, not return None.

Also, you should never check None with == or !=; always use is.

You can do this with a try—or, more simply, do if dict.get(key, None) is not None. But again, there's no reason to do so. Also, that won't handle cases where None is a perfectly valid item. If that's the case, you need to do something like sentinel = object(); if dict.get(key, sentinel) is not sentinel:.


So, the right thing to write is:

if key in d:


More generally, this is not true:

I know the "in" keyword is generally O(n) (as this just translates to python iterating over an entire list and comparing each element

The in operator, like most other operators, is just a call to a __contains__ method (or the equivalent for a C/Java/.NET/RPython builtin). list implements it by iterating the list and comparing each element; dict implements it by hashing the value and looking up the hash; blist.blist implements it by walking a B+Tree; etc. So, it could be O(n), O(1), O(log n), or something completely different.

这篇关于Python字典键。 "在"复杂的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆