查找最长可能重复字符串的实用程序 [英] Utility to find longest possible repeated strings
问题描述
是否有任何工具或实用程序或 perl/python 脚本可以在大型文本文件中找到最长的重复子字符串并打印这些模式以及每个模式出现的次数?
Is there any tool or utility or perl/python script that can find longest repeated substrings in a large text file and print those patterns and the number of times each pattern occurs?
推荐答案
http://en.wikipedia.org/wiki/Longest_repeated_substring_problem:
最长重复子串问题是找出至少出现两次的字符串中最长的子串.这个问题可以通过为字符串构建后缀树,并找到树中最深的内部节点,在线性时间和空间上解决
The longest repeated substring problem is finding the longest substring of a string that occurs at least twice. This problem can be solved in linear time and space by building a suffix tree for the string, and finding the deepest internal node in the tree
python 中的后缀树(虽然有点过时,但 ..):http://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/
Javascript 实现并进一步解释:http://www.allisons.org/ll/AlgDS/Tree/Suffix/
Javascript implementation with further explaination: http://www.allisons.org/ll/AlgDS/Tree/Suffix/
这篇关于查找最长可能重复字符串的实用程序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!