成对序列比对生物python中的核苷酸分离剂 [英] Nucleotides separator in the pairwise sequence alignment bio python

查看:102
本文介绍了成对序列比对生物python中的核苷酸分离剂的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的RNA序列包含不同的修饰核苷酸和残基.其中一些例如N79, 8XU, SDG, I.

我想使用biopython的pairwise2.align.localms成对对齐它们.为了准确地说明这些修改后的基数,是否可以将输入不是字符串形式而是列表形式?

什么是正确的技术?

解决方案

Biopython的pairwise2模块适用于字母字符串,该字符串可以是任何东西-例如:

>>> from Bio import pairwise2
>>> from Bio.pairwise2 import format_alignment
>>> for a in pairwise2.align.localms("ACCGTN97CT", "ACCG8DXCT", 2, -1, -.5, -.1):
...     print(format_alignment(*a))
... 
ACCG--TN97CT
||||||||||||
ACCG8DX---CT
  Score=9.7

ACCGTN97--CT
||||||||||||
ACCG---8DXCT
  Score=9.7

您可以根据需要设置匹配/不匹配分数.但是,这假设每个字母都是一个单独的元素.

您的问题尚不清楚,您的示例N79是一个修饰的核苷酸还是三个?如果您想将N79当作一个基础,这似乎是有可能的:我不认为这是故意的(因此,我不想依赖这种行为),但是我可以诱使pairwise2处理字符串列表:

>>> for a in pairwise2.align.localms(["A", "C", "C", "G", "T", "N97", "C", "T"], ["A", "C", "C", "G", "8DX", "C", "T"], 2, -1, -.5, -.1, gap_char=["-"]):
...     print(format_alignment(*a))                                                                                                                  ... 
['A', 'C', 'C', 'G', 'T', 'N97', 'C', 'T']
||||||||
['A', 'C', 'C', 'G', '8DX', '-', 'C', 'T']
  Score=10.5

['A', 'C', 'C', 'G', 'T', 'N97', 'C', 'T']
||||||||
['A', 'C', 'C', 'G', '-', '8DX', 'C', 'T']
  Score=10.5

请注意,默认的format_alignment函数不能很好地显示此内容.

I have RNA sequences that contain different modified nucleotides and residues. Some of them for example N79, 8XU, SDG, I.

I want to pairwise align them using biopython's pairwise2.align.localms. Is it possible to make input not as a string but as list for example in order to accurately account for these modified bases?

What is the correct technique?

解决方案

Biopython's pairwise2 module works on strings of letters, which can be anything - for example:

>>> from Bio import pairwise2
>>> from Bio.pairwise2 import format_alignment
>>> for a in pairwise2.align.localms("ACCGTN97CT", "ACCG8DXCT", 2, -1, -.5, -.1):
...     print(format_alignment(*a))
... 
ACCG--TN97CT
||||||||||||
ACCG8DX---CT
  Score=9.7

ACCGTN97--CT
||||||||||||
ACCG---8DXCT
  Score=9.7

You can set the match/mismatch scores according to your needs. However, this assumes each letter is a separate element.

It was not clear in your question if your example N79 was one modified nucleotide, or three? If you wanted to treat N79 as one base it does seem to be possible: I don't think it was intentional (so I wouldn't want to depend on this behaviour), but I could trick pairwise2 into working on lists of strings:

>>> for a in pairwise2.align.localms(["A", "C", "C", "G", "T", "N97", "C", "T"], ["A", "C", "C", "G", "8DX", "C", "T"], 2, -1, -.5, -.1, gap_char=["-"]):
...     print(format_alignment(*a))                                                                                                                  ... 
['A', 'C', 'C', 'G', 'T', 'N97', 'C', 'T']
||||||||
['A', 'C', 'C', 'G', '8DX', '-', 'C', 'T']
  Score=10.5

['A', 'C', 'C', 'G', 'T', 'N97', 'C', 'T']
||||||||
['A', 'C', 'C', 'G', '-', '8DX', 'C', 'T']
  Score=10.5

Notice the default format_alignment function does not display this very well.

这篇关于成对序列比对生物python中的核苷酸分离剂的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆