回文检测效率 [英] Palindrome detection efficiency

查看:123
本文介绍了回文检测效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我好奇的靠乔恩Limjap的采访事故并开始寻找有效的方法来做到回文检测。我查了回文高尔夫球的答案,在我看来,在答案是两种算法只,扭转了串并从尾部和头部检查。

I got curious by Jon Limjap's interview mishap and started to look for efficient ways to do palindrome detection. I checked the palindrome golf answers and it seems to me that in the answers are two algorithms only, reversing the string and checking from tail and head.

def palindrome_short(s):
    length = len(s)
    for i in xrange(0,length/2):
        if s[i] != s[(length-1)-i]: return False
    return True

def palindrome_reverse(s):
    return s == s[::-1]

我觉得这些方法都不在巨大的DNA序列精确的回文结构的检测中使用。我四处张望了一下,并没有发现什么超有效的方式,这可能是任何空闲的文章。

I think neither of these methods are used in the detection of exact palindromes in huge DNA sequences. I looked around a bit and didn't find any free article about what an ultra efficient way for this might be.

一个很好的方法可能是并行的第一个版本的分而治之的办法,分配一对字符数组1..N和长度-1-n..length-1给每个线程或处理器。

A good way might be parallelizing the first version in a divide-and-conquer approach, assigning a pair of char arrays 1..n and length-1-n..length-1 to each thread or processor.

什么将是一个更好的办法?

What would be a better way?

你知道有什么?

推荐答案

由于只有一个回文,你将不得不做,在O(N),是的。您可以通过拆分字符串如你所说的得到更多的效率具有多处理器。

Given only one palindrome, you will have to do it in O(N), yes. You can get more efficiency with multi-processors by splitting the string as you said.

现在说你想要做精确的DNA匹配。这些字符串是成千上万个字符,而且他们都非常重复。这使我们有机会进行优化。

Now say you want to do exact DNA matching. These strings are thousands of characters long, and they are very repetitive. This gives us the opportunity to optimize.

假设你拆分1000字符长的串入5对100,100。在code将是这样的:

Say you split a 1000-char long string into 5 pairs of 100,100. The code will look like this:

isPal(w[0:100],w[-100:]) and isPail(w[101:200], w[-200:-100]) ...

等...你第一次做这些比赛,你必须处理它们。但是,你可以添加你做成一个哈希表映射成对布尔所有结果:

etc... The first time you do these matches, you will have to process them. However, you can add all results you've done into a hashtable mapping pairs to booleans:

isPal = {("ATTAGC", "CGATTA"): True, ("ATTGCA", "CAGTAA"): False}

等等...这将花费太多的记忆,虽然。对于对100,100,散列映射将有2 * 4 ^ 100个元素。假设您只存储字符串两个32位的哈希值作为重点,你会需要像10 ^ 55兆,这是荒谬的。

etc... this will take way too much memory, though. For pairs of 100,100, the hash map will have 2*4^100 elements. Say that you only store two 32-bit hashes of strings as the key, you will need something like 10^55 megabytes, which is ridiculous.

也许如果你使用更小的字符串,这个问题可能是易于处理。然后你就会有一个巨大的HashMap,但至少回文对我们说10×10对将采取O(1),因此检查,如果1000字符串是一个回文将采取100的查找,而不是500进行比较。它仍然是O(N),但...

Maybe if you use smaller strings, the problem can be tractable. Then you'll have a huge hashmap, but at least palindrome for let's say 10x10 pairs will take O(1), so checking if a 1000 string is a palindrome will take 100 lookups instead of 500 compares. It's still O(N), though...

这篇关于回文检测效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆