使用正则表达式搜索直到所需的模式 [英] Using regex to search until desired pattern
问题描述
我正在使用以下正则表达式:
I am using the following regex:
orfre = '^(?:...)*?((ATG)(...){%d,}?(?=(TAG|TAA|TGA)))' % (aa)
我基本上想找到所有以ATG开头的序列,然后是三胞胎(例如TTA,TTC,GTC等),直到找到框架中的终止密码子为止.但是,正如我写的正则表达式一样,如果aa很大,它实际上不会在终止密码子处停止.取而代之的是,它将继续搜索,直到找到一个满足aa的条件为止.我希望它搜索整个字符串,直到找到终止密码子为止.如果匹配时间不够长(对于给定的aa参数),则应返回None.
I basically want to find all sequences that start with ATG followed by triplets (e.g. TTA, TTC, GTC, etc.) until it finds a stop codon in frame. However, as my regex is written, it won't actually stop at a stop codon if aa is large. Instead, it will keep searching until it finds one such that the condition of aa is met. I would rather have it search the entire string until a stop codon is found. If a match isn't long enough (for a given aa argument) then it should return None.
字符串数据: AAAATGATGCATTAACCCTAATAA
String data: AAAATGATGCATTAACCCTAATAA
正则表达式所需的输出: ATGATGCATTAA
Desired output from regex: ATGATGCATTAA
除非aa> 5,否则不返回任何内容.
Unless aa > 5, in which case nothing should be returned.
我得到的实际输出是:ATGATGCATTAACCCTAA
Actual output I'm getting: ATGATGCATTAACCCTAA
推荐答案
补充说明:如果要检查一个序列中的六个帧,请不要忘记也检查补充链:
Supplementary note: if you want to check the six frames available in one sequence, don't forget to check also the complementary chain:
comp_chain = chain[::-1]
(-> 扩展切片)
将后面的A音译为T,将G音译为C.
Transliterating latter A's for T's and G's for C's.
这篇关于使用正则表达式搜索直到所需的模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!