Python中的键排序字典 [英] Key-ordered dict in Python
问题描述
我正在寻找一个有序关联数组的固体实现,也就是有序字典。我想按照键而不是插入顺序排序。
更确切地说,我正在寻找一个int到浮点的空间效率实现对于另一用例的字符串到浮动)映射结构:
- 有序迭代是O(n)
- 随机访问是O(1)
我想出的最好的是胶合一个dict和一个键列表
随机访问O(1)是一个非常严格的要求,基本上强加了一个基础哈希表 - 我希望你的意思是随机READS,因为我认为它可以在数学上被证明比它不可能在一般情况下有O(1)写入以及O(N)有序迭代。
我不认为你会发现一个预包装容器适合你需求,因为他们是如此极端 - O(log N)访问当然会使世界上的所有差异。要获得读取和迭代所需的大O行为,您需要粘合两个数据结构,基本上是一个dict和一个堆(或排序的列表或树),并保持它们同步。虽然您没有指定,但我认为您只能获得所需的摊销行为 - 除非您真的愿意为插入和删除支付任何效果匹配,这是字面意义
对于O(1)读取和摊销 O(N)有序迭代,只保留一个dict的所有键的列表。例如:
class Crazy(object):
def __init __(self):
self.d = {}
self.L = []
self.sorted = True
def __getitem __(self,k):
return self.d [k]
def __setitem__ (self,k,v):
如果k不在self.d:
self.L.append(k)
self.sorted = False
self.d [k ] = v
def __delitem __(self,k):
del self.d [k]
self.L.remove(k)
def __iter __(self):
if not self.sorted:
self.L.sort()
self.sorted = True
return iter(self.L)
如果你不喜欢摊销O(N)订单,你可以删除self.sorted,只需重复 self。在
这使得写O(N log N),当然(当我仍然在O(1)写)。任何一种方法都是可行的,很难想到一个本质上优于另一个。如果你倾向于做一堆的写,然后一堆迭代,那么上面的代码中的方法是最好的;如果它通常是一个写,一个迭代,另一个写,另一个迭代,然后它只是一个洗。 __ setitem __
本身中的L.sort()
BTW,这无耻的优势的不寻常 - )Python排序(也称为timsort)的性能特性:其中,排序列表,大多数排序,但是在结尾添加了一些额外的项目基本上是O(N)(如果固定项目比较少到排序的前缀部分)。我听说Java很快就获得了这个,Josh Block对Python的技术演讲印象深刻,他开始编写它的JVM在他的笔记本电脑上然后和那里。大多数系统(包括我认为今天的Jython和IronPython)基本上有排序作为O(N log N)操作,没有利用大多数有序输入; 自然合并,Tim Peters塑造成今天Python的时间,在这方面是一个奇迹。
I am looking for a solid implementation of an ordered associative array, that is, an ordered dictionary. I want the ordering in terms of keys, not of insertion order.
More precisely, I am looking for a space-efficent implementation of a int-to-float (or string-to-float for another use case) mapping structure for which:
- Ordered iteration is O(n)
- Random access is O(1)
The best I came up with was gluing a dict and a list of keys, keeping the last one ordered with bisect and insert.
Any better ideas?
"Random access O(1)" is an extremely exacting requirement which basically imposes an underlying hash table -- and I hope you do mean random READS only, because I think it can be mathematically proven than it's impossible in the general case to have O(1) writes as well as O(N) ordered iteration.
I don't think you will find a pre-packaged container suited to your needs because they are so extreme -- O(log N) access would of course make all the difference in the world. To get the big-O behavior you want for reads and iterations you'll need to glue two data structures, essentially a dict and a heap (or sorted list or tree), and keep them in sync. Although you don't specify, I think you'll only get amortized behavior of the kind you want - unless you're truly willing to pay any performance hits for inserts and deletes, which is the literal implication of the specs you express but does seem a pretty unlikely real-life requirement.
For O(1) read and amortized O(N) ordered iteration, just keep a list of all keys on the side of a dict. E.g.:
class Crazy(object):
def __init__(self):
self.d = {}
self.L = []
self.sorted = True
def __getitem__(self, k):
return self.d[k]
def __setitem__(self, k, v):
if k not in self.d:
self.L.append(k)
self.sorted = False
self.d[k] = v
def __delitem__(self, k):
del self.d[k]
self.L.remove(k)
def __iter__(self):
if not self.sorted:
self.L.sort()
self.sorted = True
return iter(self.L)
If you don't like the "amortized O(N) order" you can remove self.sorted and just repeat self.L.sort()
in __setitem__
itself. That makes writes O(N log N), of course (while I still had writes at O(1)). Either approach is viable and it's hard to think of one as intrinsically superior to the other. If you tend to do a bunch of writes then a bunch of iterations then the approach in the code above is best; if it's typically one write, one iteration, another write, another iteration, then it's just about a wash.
BTW, this takes shameless advantage of the unusual (and wonderful;-) performance characteristics of Python's sort (aka "timsort"): among them, sorting a list that's mostly sorted but with a few extra items tacked on at the end is basically O(N) (if the tacked on items are few enough compared to the sorted prefix part). I hear Java's gaining this sort soon, as Josh Block was so impressed by a tech talk on Python's sort that he started coding it for the JVM on his laptop then and there. Most sytems (including I believe Jython as of today and IronPython too) basically have sorting as an O(N log N) operation, not taking advantage of "mostly ordered" inputs; "natural mergesort", which Tim Peters fashioned into Python's timsort of today, is a wonder in this respect.
这篇关于Python中的键排序字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!