如何使用树查找最长的公共子串? [英] How to find longest common substring using trees?

查看:114
本文介绍了如何使用树查找最长的公共子串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据wiki最长的常见子字符串问题可以使用后缀树来解决。

从:

  String = ABCDE $ XABCZ $ 
字符结束1 = $
└──(0)
├──(20)$
├──(22)ABC
│├──(15)DE $
│ └──(23)Z $
├──(24)BC
│├──(16)DE $
│└──(25)Z $
├ ──(26)C
│├──(17)DE $
│└──(27)Z $
├──(18)DE $
├─ ─(19)E $
├──(21)XABCZ $
└── (28)Z $

在(紧凑)后缀树中,您需要找到最深的内部节点(s),其具有来自所有字符串的叶节点。如果您有相同深度的多个节点,则必须比较该节点所表示的字符串的长度。即ABC,BC和C都具有相同的深度,因此您必须比较ABC,BC和C字符串的长度,以查看哪个更长;这显然是ABC。



后缀Trie代码

 └── null 
├──A
│└──B
│└──C
│├──D
││└──(E)ABCDE
│└──(Z)ABCZ
├──B
│└──C
│├──D
││└──(E)BCDE
│└──(Z)BCZ
├──C
│├──D
││└──(E)CDE
│└── (Z)CZ
├──D
│└──(E)DE
├──(E)E
├──X
│└─ ─A
│└──B
│└──C
│└──(Z)XABCZ
└──(Z)Z

在(不紧凑)后缀特里,找到最深的内部节点(/)>

希望它有帮助。


The longest common substring problem according to wiki can be solved using a suffix tree.
From wiki:

The longest common substrings of a set of strings can be found by building a generalised suffix tree for the strings, and then finding the deepest internal nodes which have leaf nodes from all the strings in the subtree below it

I don't get this.
Example: if I have:
ABCDE and XABCZ
then the suffix tree is (some branches from XABCZ omitted due to space):

The longest common substring is ABC but it is not I can not see how the description of wiki helps here.
ABC is not the deepest internal nodes with leaf nodes.
Any help to understand how this works?

解决方案

Like what's been said before, your tree is incorrect.

This is what I get when running "ABCDE$XABCZ" through my code.

Suffix Tree code:

String = ABCDE$XABCZ$
End of word character 1 = $
└── (0)
    ├── (20) $
    ├── (22) ABC
    │   ├── (15) DE$
    │   └── (23) Z$
    ├── (24) BC
    │   ├── (16) DE$
    │   └── (25) Z$
    ├── (26) C
    │   ├── (17) DE$
    │   └── (27) Z$
    ├── (18) DE$
    ├── (19) E$
    ├── (21) XABCZ$
    └── (28) Z$

In a (compact) suffix tree, you need to find the deepest internal node(s) which have leaf nodes from all the strings. If you have multiple nodes at the same depth, you have to compare the length of the string represented by that node. i.e. ABC, BC, and C all have the same depth, so you have to compare the length of ABC, BC, and C strings to see which is longer; which is ABC obviously.

Suffix Trie code:

└── null
    ├── A
    │   └── B
    │       └── C
    │           ├── D
    │           │   └── (E) ABCDE
    │           └── (Z) ABCZ
    ├── B
    │   └── C
    │       ├── D
    │       │   └── (E) BCDE
    │       └── (Z) BCZ
    ├── C
    │   ├── D
    │   │   └── (E) CDE
    │   └── (Z) CZ
    ├── D
    │   └── (E) DE
    ├── (E) E
    ├── X
    │   └── A
    │       └── B
    │           └── C
    │               └── (Z) XABCZ
    └── (Z) Z

In a (not compact) suffix trie, find the deepest internal node(s) which have leaf nodes from all the strings.

Hope it helps.

这篇关于如何使用树查找最长的公共子串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆