字符串操作:计算"相似的字符串,其后缀&QUOT的; [英] String manipulation: calculate the "similarity of a string with its suffixes"

查看:148
本文介绍了字符串操作:计算"相似的字符串,其后缀&QUOT的;的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关两个串A和B,我们定义字符串的相似成为通用的字符串最长preFIX的长度。例如,字符串ABC和ABD的相似性为2,而字符串AAA和AAAB的相似性是3

For two strings A and B, we define the similarity of the strings to be the length of the longest prefix common to both strings. For example, the similarity of strings "abc" and "abd" is 2, while the similarity of strings "aaa" and "aaab" is 3.

的问题是,得到的算法来计算串S的相似度的总和与每个它的后缀。比如,让字符串为: ababaa 。然后,该字符串的后缀是 ababaa 巴巴 ABAA AA A 。这些字符串以字符串 ababaa 相似之处 6,0,3,0,1,1 ,分别。因此,答案是 6 + 0 + 3 + 0 + 1 + 1 = 11

The problem is to give an algorithm to calculate the sum of similarities of a string S with each of it's suffixes. For example, let the string be : ababaa. Then, the suffixes of the string are ababaa, babaa, abaa, baa, aa and a. The similarities of each of these strings with the string ababaa are 6,0,3,0,1,1, respectively. Thus the answer is 6 + 0 + 3 + 0 + 1 + 1 = 11.

推荐答案

您需要考虑后缀阵列。一个字的后缀阵列是后缀排序在词典编纂顺序的索引的阵列。在链接的维基百科的文章,该算法计算LCP(最长的共同preFIX),因为他们计算后缀列。您可以在 O(N)使用与后缀树相似,如图<一href="http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDwQFjAE&url=http%3A%2F%2Fwww.cs.iastate.edu%2F~cs548%2Freferences%2Flinear_lcp.pdf&ei=UlDqTsj2LoqKiAKyh-nUBA&usg=AFQjCNE8jWaFo1BKSpP70vMnz8elisGRLg">this纸

You want to consider suffix arrays. The suffix array of a word is the array of the indices of suffixes sorted in lexicographical order. In the linked wikipedia article, the algorithms compute the LCP (longest common prefix) as they compute the suffix array. You can compute this in O(n) using similarities with suffix trees, as shown in this paper.

比如:你的字符串 ababaa ,所以后缀数组是这样的:

EXAMPLE: Your string is ababaa, so the suffix array looks like this:

5 | a
4 | aa
2 | abaa
0 | ababaa
3 | baa
1 | babaa

,其中左边的数字是后缀开始处的索引。现在是pretty的每个计算prefixes因为一切都是字典顺序存储。

where the number on the left is the index at which the suffix begins. Now it's pretty each to compute prefixes since everything is stored lexicographically.

作为一个方面说明,这是密切相关的最长公共子的问题。要实践你的下一个采访,思考如何有效地解决。

As a side note, this is closely related to the longest common substring problem. To practice for your next interview, think about ways to solve that efficiently.

这篇关于字符串操作:计算&QUOT;相似的字符串,其后缀&QUOT的;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆