查找由其他单词组成的最长单词 [英] find the longest word made of other words

查看:205
本文介绍了查找由其他单词组成的最长单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一个问题,那就是编写一个程序来查找单词列表中由其他单词组成的最长单词。

I am working on a problem, which is to write a program to find the longest word made of other words in a list of words.

EXAMPLE
Input: test, tester, testertest, testing, testingtester
Output: testingtester

我搜索并找到以下解决方案,我的问题是在步骤2中感到困惑,为什么我们应该以所有可能的方式打破每个单词?为什么不直接整体使用每个词呢?如果有人可以提供一些见解,那就太好了。

I searched and find the following solution, my question is I am confused in step 2, why we should break each word in all possible ways? Why not use each word directly as a whole? If anyone could give some insights, it will be great.

以下解决方案将执行以下操作:

The solution below does the following:


  1. 按大小对数组排序,将最长的单词放在前面

  2. 对于每个单词,请以所有可能的方式对其进行拆分。也就是说,对于测试,请将其分为{ t, est},{ te, st}和{ tes, t}。

  3. 然后,对于每个配对,检查数组的前半部分和后半部分是否都存在于数组中。

  4. 通过返回符合条件#3的第一个字符串,短路 。

  1. Sort the array by size, putting the longest word at the front
  2. For each word, split it in all possible ways. That is, for "test", split it into {"t", "est"}, {"te", "st"} and {"tes", "t"}.
  3. Then, for each pairing, check if the first half and the second both exist elsewhere in the array.
  4. "Short circuit" by returning the first string we find that fits condition #3.


推荐答案

间接回答您的问题,我相信以下是解决此问题的有效方法使用重试时出现问题。

Answering your question indirectly, I believe the following is an efficient way to solve this problem using tries.

构建

对单词进行排序,以使最长的单词排在最前。

Sort the words so that the longest word comes first.

现在,对于每个单词W,从树的顶部开始,然后使用您正在测试的单词中的字母,开始沿树下的单词跟随一个字母。

Now, for each word W, start at the top of the trie and begin following the word down the tree one letter at a time using letters from the word you are testing.

每次单词结束时,从顶部递归重新输入特里,记下您已经分支。如果单词末尾的字母用完了并且分支了,则说明您已经找到了一个复合词,并且由于这些词已排序,因此这是最长的复合词。

Each time a word ends, recursively re-enter the trie from the top making a note that you have "branched". If you run out of letters at the end of the word and have branched, you've found a compound word and, because the words were sorted, this is the longest compound word.

如果字母在任何时候都停止匹配,或者您用完了但不在单词的结尾,则只需回溯到分支的位置并继续插入即可。

If the letters stop matching at any point, or you run out and are not at the end of the word, just back track to wherever it was that you branched and keep plugging along.

恐怕我不太了解Java,因此无法为您提供该语言的示例代码。但是,我已经写出了一个Python解决方案(使用此答案中的trie实现)。希望对您很清楚:

I'm afraid I don't know Java that well, so I'm unable to provide you sample code in that language. I have, however, written out a solution in Python (using a trie implementation from this answer). Hopefully it is clear to you:

#!/usr/bin/env python3

#End of word symbol
_end = '_end_'

#Make a trie out of nested HashMap, UnorderedMap, dict structures
def MakeTrie(words):
  root = dict()
  for word in words:
    current_dict = root
    for letter in word:
      current_dict = current_dict.setdefault(letter, {})
    current_dict[_end] = _end
  return root

def LongestCompoundWord(original_trie, trie, word, level=0):
  first_letter = word[0]
  if not first_letter in trie:
    return False
  if len(word)==1 and _end in trie[first_letter]:
    return level>0
  if _end in trie[first_letter] and LongestCompoundWord(original_trie, original_trie, word[1:], level+1):
    return True
  return LongestCompoundWord(original_trie, trie[first_letter], word[1:], level)

#Words that were in your question
words = ['test','testing','tester','teste', 'testingtester', 'testingtestm', 'testtest','testingtest']

trie = MakeTrie(words)

#Sort words in order of decreasing length
words = sorted(words, key=lambda x: len(x), reverse=True)

for word in words:
  if LongestCompoundWord(trie,trie,word):
    print("Longest compound word was '{0:}'".format(word))
    break

有了以上几点,您对原始问题的答案就变得更加清晰:我们无法提前知道哪种前缀词组合会成功地引导我们通过树。因此,我们需要准备检查前缀词的所有可能组合。

With the above in mind, the answer to your original question becomes clearer: we do not know ahead of time which combination of prefix words will take us successfully through the tree. Therefore, we need to be prepared to check all possible combinations of prefix words.

由于您发现的算法没有一种有效的方法来知道单词的哪些子集是前缀,它将在单词中所有可能的点处拆分单词,以确保生成所有前缀。

Since the algorithm you found does not have an efficient way of knowing what subsets of a word are prefixes, it splits the word at all possible points in word to ensure that all prefixes are generated.

这篇关于查找由其他单词组成的最长单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆