减少所有英语单词的 Trie 的大小 [英] Reducing the size of a Trie of all english words

查看:50
本文介绍了减少所有英语单词的 Trie 的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Trie 实现自动完成功能.使用 this 链接中的单词列表,我将每个单词插入到我的 Trie 中.我想减少 Trie 使用的内存,而不使用像有向无环字图这样太花哨的东西.

I am implementing an autocomplete feature using a Trie. Using the words list in this link, I am inserting every word into my Trie. I want to reduce the memory used by the Trie without using something too fancy like a directed acyclic word graph.

Trie 是基于字典的,允许将其存储在 JSON 文件中.这是我当前的脚本:

The Trie is dictionary based to allow for it to be stored in a JSON file. This is my current script:

import json 

#Make a trie of english words
# The words file can be found at https://github.com/dwyl/english-words
with open('words_dictionary.json', 'r') as file:
    words = json.load(file)

_end = '_end_'
trie = {}

def make_trie(words):
    root = trie 
    for word in words:
        current = root
        for char in word:
            if char not in current:
                current[char] = {}
            current = current[char]
        current[_end] = _end 

make_trie(words)

with open('word_trie.json', 'w') as outfile:
    json.dump(trie, outfile)

如果可以的话,请帮我提供代码片段.

If this can be done, please help me out with code snippets.

推荐答案

如果您的特里是静态的,这意味着您不需要时不时地在其中插入单词,但您可以一次性"构建它,那么你需要的结构是一个 DAFSA,它代表 Directed Acyclic Finite State Automaton.如果您的尝试是动态的,这意味着您需要在其中插入新词,DAFSA 仍然是答案,但算法要困难得多.

If your trie is static, meaning that you do not need to insert words in it every now and then, but that you can build it "all at once", then the structure you need is a DAFSA, which stands for Directed Acyclic Finite State Automaton. In the case your trie is dynamic, meaning you will need to insert new words in it, the DAFSA is still the answer, but the algorithms are much harder.

这基本上是一个trie的压缩版本,它具有相同的访问速度,但在一般情况下的空间要求要低得多.

This is basically a compressed version of a trie, it has the same access speed, but a much lower space requirement in the general case.

将 trie 转换为 DAFSA(有时称为有向无环字图的 DAWG)的算法并不像简单构建 trie 的算法那么简单,但它们是可以理解的.您应该在此处找到所需的一切.

Algorithms to convert a trie into a DAFSA (sometimes called DAWG for Directed Acyclic Word Graph) are not as simple as the ones that simply build a trie, but they're understandable. You should find everything you need here.

这篇关于减少所有英语单词的 Trie 的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆