模式发现算法 [英] pattern finding algorithm

查看:69
本文介绍了模式发现算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我正在检查你们是否能够帮我找到模式的算法。我有大约2000个对齐的短序列(长度为9)。我希望能够提取相同位置的所有常见模式并报告出现次数。


例如以下内容:


ACGCATTCA

ACTGGATAC

TCAGCCATC


我想要以下输出(其中一个句号表示任何字符)


(AC .... T ..)2次出现(序列1和2之间的模式)

(.CG..C)2次出现(序列2和3之间的模式)

(.C ...... 。)2次出现(序列1和3之间的模式)


如你所见,我现在计划这样做的方式需要总和(n-1 ... 1)比较。有没有更有效的方法来做更少的比较?


谢谢

Hi,

I''m checking to see if you guys may be able to help me with an algorithm for finding patterns. I have around 2000 short sequences (of length 9) that are aligned. I want to be able to extract all common patterns on the same positions and report the number of occurrences.

For example in the following:

ACGCATTCA
ACTGGATAC
TCAGCCATC

I would like the following output (where a full stop represents any character)

(AC....T..) 2 occurrences (pattern between sequence 1 and 2)
(.C.G...C) 2 occurrences (pattern between sequence 2 and 3)
(.C.......) 2 occurrences (pattern between sequence 1 and 3)

As you can see, the way that I am planning on doing this now requires sum(n-1...1) comparisons. Is there a more efficient way of doing this with less comparisons?

Thanks

推荐答案





我正在检查你们是否能够帮助我找到寻找模式的算法。我有大约2000个对齐的短序列(长度为9)。我希望能够提取相同位置的所有常见模式并报告出现次数。


例如以下内容:


ACGCATTCA

ACTGGATAC

TCAGCCATC


我想要以下输出(其中一个句号表示任何字符)


(AC .... T ..)2次出现(序列1和2之间的模式)

(.CG..C)2次出现(序列2和3之间的模式)

(.C ...... 。)2次出现(序列1和3之间的模式)


如你所见,我现在计划这样做的方式需要总和(n-1 ... 1)比较。有没有更有效的方法来做到这一点与较少的比较?


谢谢
Hi,

I''m checking to see if you guys may be able to help me with an algorithm for finding patterns. I have around 2000 short sequences (of length 9) that are aligned. I want to be able to extract all common patterns on the same positions and report the number of occurrences.

For example in the following:

ACGCATTCA
ACTGGATAC
TCAGCCATC

I would like the following output (where a full stop represents any character)

(AC....T..) 2 occurrences (pattern between sequence 1 and 2)
(.C.G...C) 2 occurrences (pattern between sequence 2 and 3)
(.C.......) 2 occurrences (pattern between sequence 1 and 3)

As you can see, the way that I am planning on doing this now requires sum(n-1...1) comparisons. Is there a more efficient way of doing this with less comparisons?

Thanks



巧合的是,正则表达式使用 ;完全停止正如你所指定的那样(但在正则表达式中,它被称为点)。我可以提供更多详细信息,但此刻我很匆忙:

Coincidentally enough, regular expressions use the "full stop" as you have specified (but in a regex, it''s called a "dot"). I can give more detail, but am in a rush at the moment:

展开 | 选择 | Wrap | 行号



巧合的是,正则表达式使用完全停止。正如你所指定的那样(但在正则表达式中,它被称为点)。我可以提供更多详细信息,但此刻我很匆忙:
Coincidentally enough, regular expressions use the "full stop" as you have specified (but in a regex, it''s called a "dot"). I can give more detail, but am in a rush at the moment:
展开 | 选择 | Wrap | 行号



我想我已经得到了它:
I think that I''ve got it:
展开 | 选择 | Wrap < span class =codeDivider> | 行号


这篇关于模式发现算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆