群体遗传学在进化论中起着重要作用.它分析了物种之间以及同一物种中两个或更多个体之间的遗传差异.
Biopython为群体遗传学提供Bio.PopGen模块,主要支持GenePop,一种开发的流行遗传学包作者:Michel Raymond和Francois Rousset.
让我们编写一个简单的应用程序来解析GenePop格式并理解这个概念.
从下面给出的链接下载Biopython团队提供的genePop文件 : https://raw.githubusercontent.com/biopython/biopython/master/Tests/PopGen/c3line.gen
使用以下代码片段加载GenePop模块 : ;
from Bio.PopGen import GenePop
使用解析文件GenePop.read方法如下 :
record = GenePop.read(open("c3l") ine.gen"))
显示下面给出的基因座和人口信息 :
>>> record.loci_list ['136255903', '136257048', '136257636'] >>> record.pop_list ['4', 'b3', '5'] >>> record.populations [[('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (3, 4), (2, 2)]), ('3', [(3, 3), (4, 4), (2, 2)]), ('4', [(3, 3), (4, 3), (None, None)])], [('b1', [(None, None), (4, 4), (2, 2)]), ('b2', [(None, None), (4, 4), (2, 2)]), ('b3', [(None, None), (4, 4), (2, 2)])], [('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (1, 4), (2, 2)]), ('3', [(3, 2), (1, 1), (2, 2)]), ('4', [(None, None), (4, 4), (2, 2)]), ('5', [(3, 3), (4, 4), (2, 2)])]] >>>
这里,文件中有三个基因座和三组人口:第一个人口有4个记录,第二个人口有3个记录,第三个人口有5个记录. record.populations显示每个基因座的所有人群和等位基因数据.
Biopython提供删除基因座和群体数据的选项.
删除按位置设置的人口,
>>> record.remove_population(0) >>> record.populations [[('b1', [(None, None), (4, 4), (2, 2)]), ('b2', [(None, None), (4, 4), (2, 2)]), ('b3', [(None, None), (4, 4), (2, 2)])], [('1', [(3, 3), (4, 4), (2, 2)]), ('2', [(3, 3), (1, 4), (2, 2)]), ('3', [(3, 2), (1, 1), (2, 2)]), ('4', [(None, None), (4, 4), (2, 2)]), ('5', [(3, 3), (4, 4), (2, 2)])]] >>>
按位置删除基因座,
>>> record.remove_locus_by_position(0) >>> record.loci_list ['136257048', '136257636'] >>> record.populations [[('b1', [(4, 4), (2, 2)]), ('b2', [(4, 4), (2, 2)]), ('b3', [(4, 4), (2, 2)])], [('1', [(4, 4), (2, 2)]), ('2', [(1, 4), (2, 2)]), ('3', [(1, 1), (2, 2)]), ('4', [(4, 4), (2, 2)]), ('5', [(4, 4), (2, 2)])]] >>>
按名称删除基因座,
>>> record.remove_locus_by_name('136257636') >>> record.loci_list ['136257048'] >>> record.populations [[('b1', [(4, 4)]), ('b2', [(4, 4)]), ('b3', [(4, 4)])], [('1', [(4, 4)]), ('2', [(1, 4)]), ('3', [(1, 1)]), ('4', [(4, 4)]), ('5', [(4, 4)])]] >>>
Biopython提供与GenePop软件交互的接口,从而从中暴露出许多功能. Bio.PopGen.GenePop模块用于此目的.一个这样易于使用的界面是EasyController.让我们检查如何解析GenePop文件并使用EasyController进行一些分析.
首先,安装GenePop软件并将安装文件夹放在系统路径中.要获取有关GenePop文件的基本信息,请创建一个EasyController对象,然后按照以下指定调用get_basic_info方法 :
>>> from Bio.PopGen.GenePop.EasyController import EasyController >>> ec = EasyController('c3line.gen') >>> print(ec.get_basic_info()) (['4', 'b3', '5'], ['136255903', '136257048', '136257636']) >>>
这里,第一项是人口清单,第二项是基因座清单.
获取所有等位基因列表一个特定的基因座,通过传递下面指定的轨迹名称来调用get_alleles_all_pops方法 :
>>> allele_list = ec.get_alleles_all_pops("136255903") >>> print(allele_list) [2, 3]
要获得特定人群和基因座的等位基因列表,请通过传递基因座名称和人口位置来调用get_alleles如下所示 :
>>> allele_list = ec.get_alleles(0, "136255903") >>> print(allele_list) [] >>> allele_list = ec.get_alleles(1, "136255903") >>> print(allele_list) [] >>> allele_list = ec.get_alleles(2, "136255903") >>> print(allele_list) [2, 3] >>>
类似地,EasyController暴露了许多功能:等位基因频率,基因型频率,多位置F统计,Hardy-Weinberg平衡,连锁不平衡等.