如何捕捉一组中最长的序列 [英] How to catch the longest sequence of a group

查看:88
本文介绍了如何捕捉一组中最长的序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任务是找到一组中最长的序列

The task is to find the longest sequence of a group

例如,给定DNA序列: AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC
,它有7次AGATC。 (AGATC)匹配所有匹配项。
是否可以编写仅捕获最长序列的正则表达式,即给定文本中的 AGATCAGATCAGATCAGATCAGATC
如果仅使用正则表达式是不可能的,我如何遍历每个序列(即第一个序列是 AGATCAGATC ,第二个序列是 AGATCAGATCAGATCAGATCAGATC 等)在Python中?

for instance, given DNA sequence: "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC" and it has 7 occurrences of AGATC. (AGATC) matches all occurrences. Is it possible to write a regular expression that catches only the longest sequence, i.e. AGATCAGATCAGATCAGATCAGATC in the given text? If this is not possible only with regex, how can I iterate through each sequence (i.e. 1st sequence is AGATCAGATC, 2nd - AGATCAGATCAGATCAGATCAGATC et cetera) in python?

推荐答案

使用:

import re

sequence = "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC"
matches = re.findall(r'(?:AGATC)+', sequence)

# To find the longest subsequence
longest = max(matches, key=len)

说明:

非捕获组(?: AGATC)+


  • + 量词-一次和无限次匹配,例如

  • AGATC 字面上匹配字符AGATC(区分大小写)

  • + Quantifier — Matches between one and unlimited times, as many times as possible.
  • AGATC matches the characters AGATC literally (case sensitive)

结果:

# print(matches)
['AGATCAGATC', 'AGATCAGATCAGATCAGATCAGATC']

# print(longest)
'AGATCAGATCAGATCAGATCAGATC'

您可以测试正则表达式 此处

You can test the regex here.

这篇关于如何捕捉一组中最长的序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆