特定的动态嵌套词典,自动实现 [英] Specific Dynamic nested Dictionaries, Autovivification implementation
问题描述
我正在尝试以特定方式实现嵌套字典结构. 我正在读很多单词.最终将需要经常且高效地搜索这些单词,因此这就是我希望设置字典的方式:
I'm trying to implement a nested dictionary structure in a specific manner. I'm reading in a long list of words. These words are eventually going to need to be searched through often and efficiently so this is how I want my dictionary to be set up:
我正在尝试创建一个嵌套的字典结构,其中第一个键值是单词的长度,该值是一个dict,键是单词的第一个字母,而值是一个dict,键是该单词的第二个字母,值是字典,键为单词的第三个字母,等等.
I'm trying to make a nested dictionary structure where the first key value is the length of the word, the value is a dict with the key being the first letter of the word and the value is a dict with the key being the second letter of the word and the value being a dict with the key as third letter of the word etc..
所以如果我读汽车",罐头"和乔"
so if I read in "car" "can" and "joe"
我知道
{3: {c: {a: {r: car, n: can}}},j: {o: {e: joe}}}
我需要大约100,000个单词,而且它们的长度从2到27个字母不等.
I need to do this for about 100,000 words though and they vary in length from 2 to 27 letters.
我已经浏览过什么是实现嵌套字典的最佳方法? 和 动态嵌套词典.
但没有发现这一点的运气.
but haven't had any luck figuring this out.
我当然可以使用
for word in text_file.read().split()
我可以使用
for char in word
或
for i in range(len(word)):
word[i]
我只是不知道如何降低这种结构.任何帮助将不胜感激.
I just can't figure out how to get this structure down. Any help would be greatly appreciated.
推荐答案
Here's a short example on how to implement trie with autovivification built on defaultdict
. For every node that terminates a word it stores extra key term
to indicate it.
from collections import defaultdict
trie = lambda: defaultdict(trie)
def add_word(root, s):
node = root
for c in s:
node = node[c]
node['term'] = True
def list_words(root, length, prefix=''):
if not length:
if 'term' in root:
yield prefix
return
for k, v in root.items():
if k != 'term':
yield from list_words(v, length - 1, prefix + k)
WORDS = ['cars', 'car', 'can', 'joe']
root = trie()
for word in WORDS:
add_word(root, word)
print('Length {}'.format(3))
print('\n'.join(list_words(root, 3)))
print('Length {}'.format(4))
print('\n'.join(list_words(root, 4)))
输出:
Length 3
joe
can
car
Length 4
cars
这篇关于特定的动态嵌套词典,自动实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!