模式发现算法 [英] pattern finding algorithm
问题描述
我正在检查你们是否能够帮我找到模式的算法。我有大约2000个对齐的短序列(长度为9)。我希望能够提取相同位置的所有常见模式并报告出现次数。
例如以下内容:
ACGCATTCA
ACTGGATAC
TCAGCCATC
我想要以下输出(其中一个句号表示任何字符)
(AC .... T ..)2次出现(序列1和2之间的模式)
(.CG..C)2次出现(序列2和3之间的模式)
(.C ...... 。)2次出现(序列1和3之间的模式)
如你所见,我现在计划这样做的方式需要总和(n-1 ... 1)比较。有没有更有效的方法来做更少的比较?
谢谢
Hi,
I''m checking to see if you guys may be able to help me with an algorithm for finding patterns. I have around 2000 short sequences (of length 9) that are aligned. I want to be able to extract all common patterns on the same positions and report the number of occurrences.
For example in the following:
ACGCATTCA
ACTGGATAC
TCAGCCATC
I would like the following output (where a full stop represents any character)
(AC....T..) 2 occurrences (pattern between sequence 1 and 2)
(.C.G...C) 2 occurrences (pattern between sequence 2 and 3)
(.C.......) 2 occurrences (pattern between sequence 1 and 3)
As you can see, the way that I am planning on doing this now requires sum(n-1...1) comparisons. Is there a more efficient way of doing this with less comparisons?
Thanks
推荐答案
我正在检查你们是否能够帮助我找到寻找模式的算法。我有大约2000个对齐的短序列(长度为9)。我希望能够提取相同位置的所有常见模式并报告出现次数。
例如以下内容:
ACGCATTCA
ACTGGATAC
TCAGCCATC
我想要以下输出(其中一个句号表示任何字符)
(AC .... T ..)2次出现(序列1和2之间的模式)
(.CG..C)2次出现(序列2和3之间的模式)
(.C ...... 。)2次出现(序列1和3之间的模式)
如你所见,我现在计划这样做的方式需要总和(n-1 ... 1)比较。有没有更有效的方法来做到这一点与较少的比较?
谢谢
Hi,
I''m checking to see if you guys may be able to help me with an algorithm for finding patterns. I have around 2000 short sequences (of length 9) that are aligned. I want to be able to extract all common patterns on the same positions and report the number of occurrences.
For example in the following:
ACGCATTCA
ACTGGATAC
TCAGCCATC
I would like the following output (where a full stop represents any character)
(AC....T..) 2 occurrences (pattern between sequence 1 and 2)
(.C.G...C) 2 occurrences (pattern between sequence 2 and 3)
(.C.......) 2 occurrences (pattern between sequence 1 and 3)
As you can see, the way that I am planning on doing this now requires sum(n-1...1) comparisons. Is there a more efficient way of doing this with less comparisons?
Thanks
巧合的是,正则表达式使用 ;完全停止正如你所指定的那样(但在正则表达式中,它被称为点)。我可以提供更多详细信息,但此刻我很匆忙:
Coincidentally enough, regular expressions use the "full stop" as you have specified (but in a regex, it''s called a "dot"). I can give more detail, but am in a rush at the moment:
巧合的是,正则表达式使用完全停止。正如你所指定的那样(但在正则表达式中,它被称为点)。我可以提供更多详细信息,但此刻我很匆忙:
Coincidentally enough, regular expressions use the "full stop" as you have specified (but in a regex, it''s called a "dot"). I can give more detail, but am in a rush at the moment:
我想我已经得到了它:
I think that I''ve got it:
这篇关于模式发现算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!