如何找到两个序列之间的重叠,并返回 [英] How to find the overlap between 2 sequences, and return it

查看:196
本文介绍了如何找到两个序列之间的重叠,并返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新的Python和已经花费了许多时间这个问题,希望有人能帮助我。 我需要找到两个序列之间的重叠。重叠是在第一序列的左端和中的第二个的右端。 我希望函数找到重叠,并返回它。

I am new in Python, and have already spend to many hours with this problem, hope somebody can help me. I need to find the overlap between 2 sequences. The overlap is in the left end of the first sequences and the right end of the second one. I want the function to find the overlap, and return it.

我的序列是:

s1 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
s2 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"

我的函数应该被命名为

My function should be named

def getOverlap(left, right)

使用 S1 是左序列, S2 是正确的。

With s1 being the left sequence, and the s2 being the right one.

的结果应该是

‘GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC’

任何帮助是pciated AP $ P $

Any help is appreciated

推荐答案

有一个看的 difflib > <$ C $ 库多precisely在的 find_longest_match()

Have a look at the difflib library and more precisely at find_longest_match():

import difflib

def get_overlap(s1, s2):
    s = difflib.SequenceMatcher(None, s1, s2)
    pos_a, pos_b, size = s.find_longest_match(0, len(s1), 0, len(s2)) 
    return s1[pos_a:pos_a+size]

s1 = "CGATTCCAGGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC"
s2 = "GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTCGTCCAGACCCCTAGC"

print(get_overlap(s1, s2)) # GGCTCCCCACGGGGTACCCATAACTTGACAGTAGATCTC

这篇关于如何找到两个序列之间的重叠,并返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆