Python 中的键序字典 [英] Key-ordered dict in Python

查看:32
本文介绍了Python 中的键序字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找有序关联数组的可靠实现,即有序字典.我想要按键而不是插入顺序进行排序.

I am looking for a solid implementation of an ordered associative array, that is, an ordered dictionary. I want the ordering in terms of keys, not of insertion order.

更准确地说,我正在寻找一种节省空间的 int-to-float(或其他用例的字符串到浮点数)映射结构的实现:

More precisely, I am looking for a space-efficent implementation of a int-to-float (or string-to-float for another use case) mapping structure for which:

  • 有序迭代是 O(n)
  • 随机访问是 O(1)

我想出的最好的方法是粘合一个字典和一个键列表,保持最后一个用 bisect 和 insert 排序.

The best I came up with was gluing a dict and a list of keys, keeping the last one ordered with bisect and insert.

有更好的想法吗?

推荐答案

"Random access O(1)" 是一个极其严格的要求,它基本上强加了一个底层哈希表——我希望你的意思只是随机读取,因为我认为它可以在数学上证明,而不是在一般情况下不可能有 O(1) 次写入以及 O(N) 次有序迭代.

"Random access O(1)" is an extremely exacting requirement which basically imposes an underlying hash table -- and I hope you do mean random READS only, because I think it can be mathematically proven than it's impossible in the general case to have O(1) writes as well as O(N) ordered iteration.

我认为您不会找到适合您需求的预包装容器,因为它们太极端了——O(log N) 访问当然会使世界变得不同.要获得读取和迭代所需的大 O 行为,您需要粘合两个数据结构,本质上是一个字典和一个堆(或排序列表或树),并使它们保持同步.尽管您没有指定,但我认为您只会获得摊销您想要的那种行为 - 除非您真的愿意为插入和删除支付任何性能损失,这就是字面含义您表达的规格中的一部分,但在现实生活中似乎不太可能.

I don't think you will find a pre-packaged container suited to your needs because they are so extreme -- O(log N) access would of course make all the difference in the world. To get the big-O behavior you want for reads and iterations you'll need to glue two data structures, essentially a dict and a heap (or sorted list or tree), and keep them in sync. Although you don't specify, I think you'll only get amortized behavior of the kind you want - unless you're truly willing to pay any performance hits for inserts and deletes, which is the literal implication of the specs you express but does seem a pretty unlikely real-life requirement.

对于 O(1) 读取和 摊销 O(N) 有序迭代,只需在字典一侧保留所有键的列表.例如:

For O(1) read and amortized O(N) ordered iteration, just keep a list of all keys on the side of a dict. E.g.:

class Crazy(object):
  def __init__(self):
    self.d = {}
    self.L = []
    self.sorted = True
  def __getitem__(self, k):
    return self.d[k]
  def __setitem__(self, k, v):
    if k not in self.d:
      self.L.append(k)
      self.sorted = False
    self.d[k] = v
  def __delitem__(self, k):
    del self.d[k]
    self.L.remove(k)
  def __iter__(self):
    if not self.sorted:
      self.L.sort()
      self.sorted = True
    return iter(self.L)

如果您不喜欢摊销的 O(N) 顺序",您可以删除 self.sorted 并在 __setitem__ 中重复 self.L.sort()本身.当然,这使得写入 O(N log N)(虽然我仍然在 O(1) 处写入).这两种方法都是可行的,很难认为一种方法在本质上优于另一种方法.如果你倾向于做一堆写然后一堆迭代,那么上面代码中的方法是最好的;如果它通常是一次写入、一次迭代、另一次写入、另一次迭代,那么这只是一次清洗.

If you don't like the "amortized O(N) order" you can remove self.sorted and just repeat self.L.sort() in __setitem__ itself. That makes writes O(N log N), of course (while I still had writes at O(1)). Either approach is viable and it's hard to think of one as intrinsically superior to the other. If you tend to do a bunch of writes then a bunch of iterations then the approach in the code above is best; if it's typically one write, one iteration, another write, another iteration, then it's just about a wash.

顺便说一句,这无耻地利用了 Python 排序(又名timsort")的不寻常(和美妙;-)性能特征:其中,对一个主要排序的列表进行排序,但最后添加了一些额外的项目基本上是 O(N)(如果与排序的前缀部分相比,附加的项目足够少).我听说 Java 很快就会获得这种类型,因为 Josh Block 对 Python 类型的技术演讲印象深刻,他开始在他的笔记本电脑上为 JVM 编写代码.大多数系统(包括我相信今天的 Jython 和 IronPython)基本上都将排序作为 O(N log N) 操作,而不是利用大部分有序"的输入;自然归并排序",Tim Peters 将其塑造成今天 Python 的 timsort,在这方面是一个奇迹.

BTW, this takes shameless advantage of the unusual (and wonderful;-) performance characteristics of Python's sort (aka "timsort"): among them, sorting a list that's mostly sorted but with a few extra items tacked on at the end is basically O(N) (if the tacked on items are few enough compared to the sorted prefix part). I hear Java's gaining this sort soon, as Josh Block was so impressed by a tech talk on Python's sort that he started coding it for the JVM on his laptop then and there. Most sytems (including I believe Jython as of today and IronPython too) basically have sorting as an O(N log N) operation, not taking advantage of "mostly ordered" inputs; "natural mergesort", which Tim Peters fashioned into Python's timsort of today, is a wonder in this respect.

这篇关于Python 中的键序字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆