找出一组在Python的最小汉明距离 [英] Finding Minimum hamming distance of a set of strings in python

查看:257
本文介绍了找出一组在Python的最小汉明距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

予有存储在列表反式一组n个(〜1000000)字符串(DNA序列)。我一定要找到列表中的所有序列的最小汉明距离。我实现了一个幼稚的蛮力算法,它已经运行了超过​​一天,还没有给出解决办法。我的code是

  DMIN = LEN(反[0])
对我的xrange(LEN(反)):
    对于j中的xrange第(i + 1,LEN(反式)):
            DIST = hamdist(反式[I] [: -  1],反式[J] [: -  1])
            如果DIST< DMIN:
                    DMIN = DIST
 

有没有更有效的方法来做到这一点?这里hamdist是一个函数我写信给找汉明距离。这是

 高清hamdist(STR1,STR2):
diff文件= 0
如果len(STR1)= LEN(STR2)!
  返回最大(LEN(STR1),LEN(STR2))
对于CH1,CH2拉链(STR1,STR2):
  如果CH1 = CH2!
      差异列表+ = 1
返回的diff
 

解决方案

您可以通过添加包含你走到这一步的最小距离的可选参数优化 hamdist 功能,这样,如果的diff 达到该值停止计算距离,因为这种比较会给你比最小一个更大的距离:

 高清hamdist(STR1,STR2,prevMin =无):
diff文件= 0
如果len(STR1)= LEN(STR2)!
  返回最大(LEN(STR1),LEN(STR2))
对于CH1,CH2拉链(STR1,STR2):
  如果CH1 = CH2!
      差异列表+ = 1
      如果prevMin不无和diff文件> prevMin:
          返回None
返回的diff
 

您需要调整您的主回路与 hamdist 返回值来工作:

  DMIN = LEN(反[0])
对我的xrange(LEN(反)):
    对于j中的xrange第(i + 1,LEN(反式)):
            DIST = hamdist(反式[I] [: -  1],反式[J] [: -  1])
            如果DIST不无和DIST< DMIN:
                    DMIN = DIST
 

I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, which has been running for more than a day and has not yet given a solution. My code is

dmin=len(trans[0])
for i in xrange(len(trans)):
    for j in xrange(i+1,len(trans)):
            dist=hamdist(trans[i][:-1], trans[j][:-1])
            if dist < dmin:
                    dmin = dist

Is there a more efficient method to do this? Here hamdist is a function I wrote to find hamming distances. It is

def hamdist(str1, str2):
diffs = 0
if len(str1) != len(str2):
  return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
  if ch1 != ch2:
      diffs += 1
return diffs

解决方案

You could optimize your hamdist function by adding an optional parameter containing the minimum distance you have got so far, this way if diffs reaches that value you stop calculating the distance because this comparison will give you a greater distance than the minimum:

def hamdist(str1, str2,prevMin=None):
diffs = 0
if len(str1) != len(str2):
  return max(len(str1),len(str2))
for ch1, ch2 in zip(str1, str2):
  if ch1 != ch2:
      diffs += 1
      if prevMin is not None and diffs>prevMin:
          return None
return diffs

You will need to adapt your main loop to work with None return value from hamdist:

dmin=len(trans[0])
for i in xrange(len(trans)):
    for j in xrange(i+1,len(trans)):
            dist=hamdist(trans[i][:-1], trans[j][:-1])
            if dist is not None and dist < dmin:
                    dmin = dist

这篇关于找出一组在Python的最小汉明距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆