如何使用树找到最长的公共子串? [英] How to find longest common substring using trees?

查看:27
本文介绍了如何使用树找到最长的公共子串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据wiki的最长公共子串问题可以使用后缀树来解决.
来自 :

String = ABCDE$XABCZ$字尾字符 1 = $└── (0)├── (20) $├── (22) ABC│ ├── (15) DE$│ └── (23) Z$├── (24) 公元前│ ├── (16) DE$│ └── (25) Z$├── (26) C│ ├── (17) DE$│ └── (27) Z$├── (18) DE$├── (19) E$├── (21) XABCZ$└── (28) Z$

在(紧凑的)后缀树中,您需要从所有字符串中找到具有叶节点的最深内部节点.如果在同一深度有多个节点,则必须比较该节点表示的字符串的长度.即 ABC、BC 和 C 都具有相同的深度,因此您必须比较 ABC、BC 和 C 字符串的长度以查看哪个更长;显然是ABC.

后缀树代码:

└── null├── A│ └── B│ └── C│ ├── D│ │ └── (E) ABCDE│ └── (Z) ABCZ├── B│ └── C│ ├── D│ │ └── (E) BCDE│ └── (Z) BCZ├── C│ ├── D│ │ └── (E) CDE│ └── (Z) CZ├── D│ └── (E) DE├── (E) E├── X│ └── A│ └── B│ └── C│ └── (Z) XABCZ└── (Z) Z

在(非紧凑的)后缀树中,从所有字符串中找到具有叶节点的最深内部节点.

希望有帮助.

The longest common substring problem according to wiki can be solved using a suffix tree.
From wiki:

The longest common substrings of a set of strings can be found by building a generalised suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it

I don't get this.
Example: if I have:
ABCDE and XABCZ
then the suffix tree is (some branches from XABCZ omitted due to space):

The longest common substring is ABC but it is not I can not see how the description of wiki helps here.
ABC is not the deepest internal nodes with leaf nodes.
Any help to understand how this works?

解决方案

Like what's been said before, your tree is incorrect.

This is what I get when running "ABCDE$XABCZ" through my code.

Suffix Tree code:

String = ABCDE$XABCZ$
End of word character 1 = $
└── (0)
    ├── (20) $
    ├── (22) ABC
    │   ├── (15) DE$
    │   └── (23) Z$
    ├── (24) BC
    │   ├── (16) DE$
    │   └── (25) Z$
    ├── (26) C
    │   ├── (17) DE$
    │   └── (27) Z$
    ├── (18) DE$
    ├── (19) E$
    ├── (21) XABCZ$
    └── (28) Z$

In a (compact) suffix tree, you need to find the deepest internal node(s) which have leaf nodes from all the strings. If you have multiple nodes at the same depth, you have to compare the length of the string represented by that node. i.e. ABC, BC, and C all have the same depth, so you have to compare the length of ABC, BC, and C strings to see which is longer; which is ABC obviously.

Suffix Trie code:

└── null
    ├── A
    │   └── B
    │       └── C
    │           ├── D
    │           │   └── (E) ABCDE
    │           └── (Z) ABCZ
    ├── B
    │   └── C
    │       ├── D
    │       │   └── (E) BCDE
    │       └── (Z) BCZ
    ├── C
    │   ├── D
    │   │   └── (E) CDE
    │   └── (Z) CZ
    ├── D
    │   └── (E) DE
    ├── (E) E
    ├── X
    │   └── A
    │       └── B
    │           └── C
    │               └── (Z) XABCZ
    └── (Z) Z

In a (not compact) suffix trie, find the deepest internal node(s) which have leaf nodes from all the strings.

Hope it helps.

这篇关于如何使用树找到最长的公共子串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆