c ++的后缀树库和简单的例子如何使用它 [英] Suffix tree library for c++ with simple examples how to use it

查看:140
本文介绍了c ++的后缀树库和简单的例子如何使用它的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在搜索后缀树库(具有线性时间构造),所有我发现是PATL,但PATL没有文档,我不能找出任何的例子。
所以有一个后缀树库的c ++有一个体面的文档?

I'm searching for suffix tree library (that has linear time construction), and all I found is PATL, but PATL has no documentation and I can't figure out any of the examples. So is there a suffix tree library for c++ that has a decent documentation?

PATL首页:
http://code.google.com/p/patl/

编辑:

动机:我需要处理大量的字符串,并找到频繁的公共子字符串,并报告如果超过n次出现任何子字符串在t秒内发生。我实现了一个树(在节点中的计数器,实际上它不是一个计数器,但访问次数的std ::矢量,因为我说我需要时间),但它是非常缓慢。
所以我想到了bulking up(连接字符串之间的一些随机的东西,使得子字符串不跨越多个字符串)一定量的消息(让我们说30秒的数据),然后构建一个后缀树上的字符串。


Motivation: I need to process large amount of strings and find the frequent common substrings, and report if more than n occurrences of any substring occurred within t seconds. I implemented a tree (with counter in the nodes, actually it isn't a counter but an std::vector of visit times since like I said I need time), but it is very slow. So I thought of bulking up (concatenating with some random stuff between strings so that substrings don't span over more than one string) a certain amount of messages (let's say 30 seconds worth of data) and then build a suffix tree on that string.

推荐答案

查看 SeqAn 库,它提供具有文档的各种搜索算法和数据结构的高性能实现。

Take a look at the SeqAn library which offers high-performance implementations of various search algorithms and data structures with documentation.

例如, suffix array 类可以用作后缀树的插入替换。

For instance, the suffix array class can be used as a drop-in replacement for suffix trees.

除此之外,你的问题听起来固有的复杂,我不知道你能加快多少。在一般的短语中,它是一个多重对齐问题,这是NP难。你可以把这变成一个更容易处理的东西,因为你只对确切的子匹配感兴趣,但它仍然很复杂。

Apart from that, your problem sounds inherently complex, I’m not sure how much you can speed it up. In a general phrasing it’s a multiple alignment problem which is NP hard. You can probably transform this into something more tractable since you’re only interested in exact submatches but it’s still complex.

这篇关于c ++的后缀树库和简单的例子如何使用它的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆