Python中的键排序字典 [英] Key-ordered dict in Python

查看:149
本文介绍了Python中的键排序字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个有序关联数组的固体实现,也就是有序字典。我想按照键而不是插入顺序排序。



更确切地说,我正在寻找一个int到浮点的空间效率实现对于另一用例的字符串到浮动)映射结构:




  • 有序迭代是O(n)

  • 随机访问是O(1)



我想出的最好的是胶合一个dict和一个键列表

解决方案



随机访问O(1)是一个非常严格的要求,基本上强加了一个基础哈希表 - 我希望你的意思是随机READS,因为我认为它可以在数学上被证明比它不可能在一般情况下有O(1)写入以及O(N)有序迭代。



我不认为你会发现一个预包装容器适合你需求,因为他们是如此极端 - O(log N)访问当然会使世界上的所有差异。要获得读取和迭代所需的大O行为,您需要粘合两个数据结构,基本上是一个dict和一个堆(或排序的列表或树),并保持它们同步。虽然您没有指定,但我认为您只能获得所需的摊销行为 - 除非您真的愿意为插入和删除支付任何效果匹配,这是字面意义



对于O(1)读取和摊销 O(N)有序迭代,只保留一个dict的所有键的列表。例如:

  class Crazy(object):
def __init __(self):
self.d = {}
self.L = []
self.sorted = True
def __getitem __(self,k):
return self.d [k]
def __setitem__ (self,k,v):
如果k不在self.d:
self.L.append(k)
self.sorted = False
self.d [k ] = v
def __delitem __(self,k):
del self.d [k]
self.L.remove(k)
def __iter __(self):
if not self.sorted:
self.L.sort()
self.sorted = True
return iter(self.L)

如果你不喜欢摊销O(N)订单,你可以删除self.sorted,只需重复 self。在 __ setitem __ 本身中的L.sort()这使得写O(N log N),当然(当我仍然在O(1)写)。任何一种方法都是可行的,很难想到一个本质上优于另一个。如果你倾向于做一堆的写,然后一堆迭代,那么上面的代码中的方法是最好的;如果它通常是一个写,一个迭代,另一个写,另一个迭代,然后它只是一个洗。



BTW,这无耻的优势的不寻常 - )Python排序(也称为timsort)的性能特性:其中,排序列表,大多数排序,但是在结尾添加了一些额外的项目基本上是O(N)(如果固定项目比较少到排序的前缀部分)。我听说Java很快就获得了这个,Josh Block对Python的技术演讲印象深刻,他开始编写它的JVM在他的笔记本电脑上然后和那里。大多数系统(包括我认为今天的Jython和IronPython)基本上有排序作为O(N log N)操作,没有利用大多数有序输入; 自然合并,Tim Peters塑造成今天Python的时间,在这方面是一个奇迹。


I am looking for a solid implementation of an ordered associative array, that is, an ordered dictionary. I want the ordering in terms of keys, not of insertion order.

More precisely, I am looking for a space-efficent implementation of a int-to-float (or string-to-float for another use case) mapping structure for which:

  • Ordered iteration is O(n)
  • Random access is O(1)

The best I came up with was gluing a dict and a list of keys, keeping the last one ordered with bisect and insert.

Any better ideas?

解决方案

"Random access O(1)" is an extremely exacting requirement which basically imposes an underlying hash table -- and I hope you do mean random READS only, because I think it can be mathematically proven than it's impossible in the general case to have O(1) writes as well as O(N) ordered iteration.

I don't think you will find a pre-packaged container suited to your needs because they are so extreme -- O(log N) access would of course make all the difference in the world. To get the big-O behavior you want for reads and iterations you'll need to glue two data structures, essentially a dict and a heap (or sorted list or tree), and keep them in sync. Although you don't specify, I think you'll only get amortized behavior of the kind you want - unless you're truly willing to pay any performance hits for inserts and deletes, which is the literal implication of the specs you express but does seem a pretty unlikely real-life requirement.

For O(1) read and amortized O(N) ordered iteration, just keep a list of all keys on the side of a dict. E.g.:

class Crazy(object):
  def __init__(self):
    self.d = {}
    self.L = []
    self.sorted = True
  def __getitem__(self, k):
    return self.d[k]
  def __setitem__(self, k, v):
    if k not in self.d:
      self.L.append(k)
      self.sorted = False
    self.d[k] = v
  def __delitem__(self, k):
    del self.d[k]
    self.L.remove(k)
  def __iter__(self):
    if not self.sorted:
      self.L.sort()
      self.sorted = True
    return iter(self.L)

If you don't like the "amortized O(N) order" you can remove self.sorted and just repeat self.L.sort() in __setitem__ itself. That makes writes O(N log N), of course (while I still had writes at O(1)). Either approach is viable and it's hard to think of one as intrinsically superior to the other. If you tend to do a bunch of writes then a bunch of iterations then the approach in the code above is best; if it's typically one write, one iteration, another write, another iteration, then it's just about a wash.

BTW, this takes shameless advantage of the unusual (and wonderful;-) performance characteristics of Python's sort (aka "timsort"): among them, sorting a list that's mostly sorted but with a few extra items tacked on at the end is basically O(N) (if the tacked on items are few enough compared to the sorted prefix part). I hear Java's gaining this sort soon, as Josh Block was so impressed by a tech talk on Python's sort that he started coding it for the JVM on his laptop then and there. Most sytems (including I believe Jython as of today and IronPython too) basically have sorting as an O(N log N) operation, not taking advantage of "mostly ordered" inputs; "natural mergesort", which Tim Peters fashioned into Python's timsort of today, is a wonder in this respect.

这篇关于Python中的键排序字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆