使用 Trie 在单词列表中查找复合词 [英] Find Compound Words in List of Words using Trie

查看:42
本文介绍了使用 Trie 在单词列表中查找复合词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个单词列表,我想弄清楚如何在该列表中查找由列表中的其他单词组成的单词.例如,如果列表是 ["race", "racecar", "car"],我想返回 ["racecar"].

Given a list of words, I am trying to figure out how to find words in that list that are made up of other words in the list. For example, if the list were ["race", "racecar", "car"], I would want to return ["racecar"].

这是我的一般思考过程.我知道使用特里对这类问题有好处.对于每个单词,我可以使用 trie 找到它的所有前缀(也是列表中的单词).然后对于每个前缀,我可以检查单词的后缀是否由树中的一个或多个单词组成.但是,我很难实现这一点.我已经能够实现 trie 和函数来获取一个单词的所有前缀.我只是坚持实现复合词检测.

Here is my general thought process. I understand that using a trie would be good for this sort of problem. For each word, I can find all of its prefixes (that are also words in the list) using the trie. Then for each prefix, I can check to see if the word's suffix is made up of one or more words in the trie. However, I am having a hard time implementing this. I have been able to implement the trie and and the function to get all prefixes of a word. I am just stuck on implementing the compound word detection.

推荐答案

您可以将 Trie 节点显示为 defaultdict 对象,如果前缀是单词,这些对象已扩展为包含布尔标志标记.然后,您可以进行两次处理,在第一轮中将所有单词添加到 Trie,然后在第二轮中检查每个单词是否为组合:

You could present Trie nodes as defaultdict objects which have been extended to contain a boolean flag marking if the prefix is a word. Then you could have two pass processing where on the first round you add all the words to Trie and on second round check for each word if it's a combination or not:

from collections import defaultdict

class Node(defaultdict):
    def __init__(self):
        super().__init__(Node)
        self.terminal = False

class Trie():
    def __init__(self, it):
        self.root = Node()
        for word in it:
            self.add_word(word)

    def __contains__(self, word):
        node = self.root
        for c in word:
            node = node.get(c)
            if node is None:
                return False

        return node.terminal

    def add_word(self, word):
        node = self.root
        for c in word:
            node = node[c]

        node.terminal = True

    def is_combination(self, word):
        node = self.root
        for i, c in enumerate(word):
            node = node.get(c)
            if not node:
                break
            # If prefix is a word check if suffix can be found
            if node.terminal and word[i+1:] in self:
                return True

        return False

lst = ["race", "racecar", "car"]
t = Trie(lst)

print([w for w in lst if t.is_combination(w)])

输出:

['racecar']

这篇关于使用 Trie 在单词列表中查找复合词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆