如何捕捉一组中最长的序列 [英] How to catch the longest sequence of a group
问题描述
任务是找到一组中最长的序列
The task is to find the longest sequence of a group
例如,给定DNA序列: AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC
,它有7次AGATC。 (AGATC)
匹配所有匹配项。
是否可以编写仅捕获最长序列的正则表达式,即给定文本中的 AGATCAGATCAGATCAGATCAGATC
?
如果仅使用正则表达式是不可能的,我如何遍历每个序列(即第一个序列是 AGATCAGATC
,第二个序列是 AGATCAGATCAGATCAGATCAGATC
等)在Python中?
for instance, given DNA sequence: "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC"
and it has 7 occurrences of AGATC. (AGATC)
matches all occurrences.
Is it possible to write a regular expression that catches only the longest sequence, i.e. AGATCAGATCAGATCAGATCAGATC
in the given text?
If this is not possible only with regex, how can I iterate through each sequence (i.e. 1st sequence is AGATCAGATC
, 2nd - AGATCAGATCAGATCAGATCAGATC
et cetera) in python?
推荐答案
使用:
import re
sequence = "AGATCAGATCTTTTTTCTAATGTCTAGGATATATCAGATCAGATCAGATCAGATCAGATC"
matches = re.findall(r'(?:AGATC)+', sequence)
# To find the longest subsequence
longest = max(matches, key=len)
说明:
非捕获组(?: AGATC)+
-
+
量词-一次和无限次匹配,例如 -
AGATC
字面上匹配字符AGATC(区分大小写)
+
Quantifier — Matches between one and unlimited times, as many times as possible.AGATC
matches the characters AGATC literally (case sensitive)
结果:
# print(matches)
['AGATCAGATC', 'AGATCAGATCAGATCAGATCAGATC']
# print(longest)
'AGATCAGATCAGATCAGATCAGATC'
您可以测试正则表达式 此处
。
You can test the regex here
.
这篇关于如何捕捉一组中最长的序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!