查找键值以相同前缀开头的字典值的更有效方法 [英] More efficient way to look up dictionary values whose keys start with same prefix
问题描述
我有一本字典,其键来自共享相同前缀的集合,例如:
I have a dictionary whose keys come in sets that share the same prefix, like this:
d = { "key1":"valA", "key123":"valB", "key1XY":"valC",
"key2":"valD", "key2-22":"valE" }
给出查询字符串,我需要查找与以该前缀开头的键相关联的所有值,例如对于query="key1"
,我需要获取["valA", "valB", "valC"]
Given a query string, I need to look up all the values associated with keys that start with that prefix, e.g. for query="key1"
I need to get ["valA", "valB", "valC"]
我的以下实现有效,但是对于大量查询而言太慢了,因为字典d
具有大约30,000个键,并且大多数键的长度超过20个字符:
My implementation below works but is too slow for a large number of queries since the dictionary d
has about 30,000 keys and most of the keys are more than 20 characters long:
result = [d[s] for s in d.keys() if s.startswith(query)]
是否有更快/更有效的方法来实现这一目标?
Is there a faster/more efficient way to implement this?
推荐答案
您可以避免生成由dict.keys()
生成的中间列表(在python 2.x中):
You can avoid producing the intermediate list generated by dict.keys()
(in python 2.x):
result = [d[key] for key in d if key.startswith(query)]
但是您很可能想使用 trie 字典,因此您可以找到与具有共同前缀的键关联的所有值(特里类似于基于前缀的树).
But you most likely want to use a trie instead of a dictionary, so you can find all the values associated with a key with a common prefix (a trie is similar to a tree based on prefixes).
在这里,您可以找到一些不同的try实现.
Here you can find some different implementation of tries.
键盘"A","to","tea","ted","ten","i","in"和"inn"的特里. (来源维基百科)
A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". (source wikipedia)
让我们比较一下不同解决方案的时间:
Let's compare the timings for the different solutions:
# create a dictionary with 30k entries
d = {str(x):str(x) for x in xrange(1, 30001)}
query = '108'
# dict with keys()
%timeit [d[s] for s in d.keys() if s.startswith(query)]
100 loops, best of 3: 8.87 ms per loop
# dict without keys()
%timeit [d[s] for s in d if s.startswith(query)]
100 loops, best of 3: 7.83 ms per loop
# 11.72% improvement
# PyTrie (https://pypi.python.org/pypi/PyTrie/0.2)
import pytrie
pt = pytrie.Trie(d)
%timeit [pt[s] for s in pt.iterkeys(query)]
1000 loops, best of 3: 320 µs per loop
# 96.36% improvement
# datrie (https://pypi.python.org/pypi/datrie/0.7)
import datrie
dt = datrie.Trie('0123456789')
for key, val in d.iteritems():
dt[unicode(key)] = val
%timeit [dt[s] for s in dt.keys(unicode(query))]
10000 loops, best of 3: 162 µs per loop
# 98.17% improvement
这篇关于查找键值以相同前缀开头的字典值的更有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!