回文检测效率 [英] Palindrome detection efficiency

查看：123 发布时间：2015/11/30 14:23:49 algorithm performance palindrome

本文介绍了回文检测效率的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我好奇的靠乔恩Limjap的采访事故并开始寻找有效的方法来做到回文检测。我查了回文高尔夫球的答案，在我看来，在答案是两种算法只，扭转了串并从尾部和头部检查。

I got curious by Jon Limjap's interview mishap and started to look for efficient ways to do palindrome detection. I checked the palindrome golf answers and it seems to me that in the answers are two algorithms only, reversing the string and checking from tail and head.

def palindrome_short(s):
    length = len(s)
    for i in xrange(0,length/2):
        if s[i] != s[(length-1)-i]: return False
    return True

def palindrome_reverse(s):
    return s == s[::-1]

我觉得这些方法都不在巨大的DNA序列精确的回文结构的检测中使用。我四处张望了一下，并没有发现什么超有效的方式，这可能是任何空闲的文章。

I think neither of these methods are used in the detection of exact palindromes in huge DNA sequences. I looked around a bit and didn't find any free article about what an ultra efficient way for this might be.

一个很好的方法可能是并行的第一个版本的分而治之的办法，分配一对字符数组1..N和长度-1-n..length-1给每个线程或处理器。

A good way might be parallelizing the first version in a divide-and-conquer approach, assigning a pair of char arrays 1..n and length-1-n..length-1 to each thread or processor.

什么将是一个更好的办法？

What would be a better way?

你知道有什么？

推荐答案

由于只有一个回文，你将不得不做，在O（N），是的。您可以通过拆分字符串如你所说的得到更多的效率具有多处理器。

Given only one palindrome, you will have to do it in O(N), yes. You can get more efficiency with multi-processors by splitting the string as you said.

现在说你想要做精确的DNA匹配。这些字符串是成千上万个字符，而且他们都非常重复。这使我们有机会进行优化。

Now say you want to do exact DNA matching. These strings are thousands of characters long, and they are very repetitive. This gives us the opportunity to optimize.

假设你拆分1000字符长的串入5对100,100。在code将是这样的：

Say you split a 1000-char long string into 5 pairs of 100,100. The code will look like this:

isPal(w[0:100],w[-100:]) and isPail(w[101:200], w[-200:-100]) ...

等...你第一次做这些比赛，你必须处理它们。但是，你可以添加你做成一个哈希表映射成对布尔所有结果：

etc... The first time you do these matches, you will have to process them. However, you can add all results you've done into a hashtable mapping pairs to booleans:

isPal = {("ATTAGC", "CGATTA"): True, ("ATTGCA", "CAGTAA"): False}

等等...这将花费太多的记忆，虽然。对于对100,100，散列映射将有2 * 4 ^ 100个元素。假设您只存储字符串两个32位的哈希值作为重点，你会需要像10 ^ 55兆，这是荒谬的。

etc... this will take way too much memory, though. For pairs of 100,100, the hash map will have 2*4^100 elements. Say that you only store two 32-bit hashes of strings as the key, you will need something like 10^55 megabytes, which is ridiculous.

也许如果你使用更小的字符串，这个问题可能是易于处理。然后你就会有一个巨大的HashMap，但至少回文对我们说10×10对将采取O（1），因此检查，如果1000字符串是一个回文将采取100的查找，而不是500进行比较。它仍然是O（N），但...

Maybe if you use smaller strings, the problem can be tractable. Then you'll have a huge hashmap, but at least palindrome for let's say 10x10 pairs will take O(1), so checking if a 1000 string is a palindrome will take 100 lookups instead of 500 compares. It's still O(N), though...

这篇关于回文检测效率的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

回文检测效率 [英] Palindrome detection efficiency

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录关闭

回文检测效率 [英] Palindrome detection efficiency

问题描述

推荐答案

相关文章

C/C++最新文章

热门教程

热门工具

登录 关闭

登录关闭