罗莎琳德(Rosalind)的《孟德尔第一法》知识产权局 [英] Rosalind "Mendel's First Law" IPRB
问题描述
为了为即将到来的生物信息学课程做准备,我正在从rosalind.info进行一些作业.我目前陷入"门德尔的第一定律"中.
我认为我可以通过这种方式蛮横地对待自己,但是以某种方式我的思想必须太复杂了.我的方法是这样:
构建具有三个级别的概率树.有两个生物配对,生物A和生物B.第一层是,选择纯合子显性(k),杂合子(m)或隐性纯合子(n)作为生物A的概率.例如,对于纯合子优势,似乎总共有(k + m + n)个生物,而其中k个是纯合子优势,则概率为k/(k + m + n).
然后在这棵树中,假设我们知道被拣选的生物A是什么,则生物B的概率为k/m/n.例如,如果将生物A选为杂合(m),则生物B也将是杂合的概率为(m-1)/(k + m + n-1),因为现在还剩下一个杂合较少的生物.
这将给出两个级别的概率,并且要达到这一目的,将涉及大量代码,因为我实际上是在构建树结构,并且对于每个分支,该部分都需要手动编写代码.
现在,选择生物A和B后,它们每个都有两条染色体.这些染色体之一可以随机选择.因此,对于A染色体,可以选择1或2条染色体,对于B染色体,可以选择相同的染色体.因此,有4种不同的选择:选择A的1个,B的1个.选择A的2个,B的1个.选择A的1个,B的2个. A的2个,B的2个.每一个的概率为1/4.所以最后这棵树将具有这些叶概率.
然后从某种程度上通过魔术我将所有这些概率加起来,看看两个生物产生具有优势等位基因的生物的概率是什么.
我怀疑此任务的设计需要花费数小时才能解决.我在想什么呢?
更新:
以最荒谬的蛮力方式解决了这个问题.只需运行数千个模拟交配,然后找出最终具有显性等位基因的那一部分,直到有足够的精度来通过任务即可.
import random
k = 26
m = 18
n = 25
trials = 0
dominants = 0
while True:
s = ['AA'] * k + ['Aa'] * m + ['aa'] * n
first = random.choice(s)
s.remove(first)
second = random.choice(s)
has_dominant_allele = 'A' in [random.choice(first), random.choice(second)]
trials += 1
if has_dominant_allele:
dominants += 1
print "%.5f" % (dominants / float(trials))
等位基因占优势的物种是AA
或Aa
.
您的总种群(k + n + m
由具有AA
的k
(hom
)纯合子优势菌,具有Aa
和n
的杂合子优势菌(aa
的纯合隐性生物,它们每个都可以彼此交配.
具有优势等位基因的生物的概率为:
P_dom = n_dominant/n_total or 1 - n_recessive/n_total
对这些组合中的每一个进行Punnett平方不是一个坏主意:
hom + het
| A | a
-----------
A | AA | Aa
a | Aa | aa
het + rec
| a | a
-----------
A | Aa | Aa
a | aa | aa
显然,两种生物的交配会导致四个可能的孩子. hom
+ het
产生隐性等位基因的4种生物中的1个,het
+ rec
产生隐性等位基因的4种生物中的2个.
对于其他组合,您可能也想这样做.
由于我们不仅要使生物体一对一地交配,而且要把整个k + m + n
串在一起,所以很高兴知道后代的总数和带有特定等位基因的子代"的数量.
如果您不介意使用Python,可以在这里使用scipy.misc
中的comb
.在计算中,不要忘记(a)从每个组合中获得4
个孩子,以及(b)需要一个因子(从Punnett平方中得出)来确定组合中的隐性(或显性)后代./p>
更新
# total population
pop_total = 4 * comb(hom + het + rec, 2)
# use PUNNETT squares!
# dominant organisms
dom_total = 4*comb(hom,2) + 4*hom*het + 4*hom*rec + 3*comb(het,2) + 2*het*rec
# probability for dominant organisms
phom = dom_total/pop_total
print phom
# probability for dominant organisms +
# probability for recessive organisms should be 1
# let's check that:
rec_total = 4 * comb(rec, 2) + 2*rec*het + comb(het, 2)
prec = totalrec/totalpop
print 1 - prec
As preparation for an upcoming bioinformatics course, I am doing some assignments from rosalind.info. I am currently stuck in the assignment "Mendel's First Law".
I think I could brute force myself through this, but that somehow my thinking must be too convoluted. My approach would be this:
Build a tree of probabilities which has three levels. There are two creatures that mate, creature A and creature B. First level is, what is the probability for picking as creature A homozygous dominant (k), heterozygous (m) or homozygous recessive (n). It seems that for example for homozygous dominant, since there are a total of (k+m+n) creatures and k of them are homozygous dominant, the probability is k/(k+m+n).
Then in this tree, under each of these would come the probability of creature B being k / m / n given that we know what creature A got picked as. For example if creature A was picked to be heterozygous (m), then the probability that creature B would also be heterozygous is (m-1)/(k+m+n-1) because there is now one less heterozygous creature left.
This would give the two levels of probabilities, and would involve a lot of code just to get this far, as I would literally be building a tree structure and for each branch have manually written code for that part.
Now after choosing creatures A and B, each of them has two chromosomes. One of these chromosomes can randomly be picked. So for A chromosome 1 or 2 can be picked and same for B. So there are 4 different options: pick 1 of A, 1 of B. Pick 2 of A, 1 of B. Pick 1 of A, 2 of B. Pick 2 of A, 2 of B. The probability of each of these would be 1/4. So finally this tree would have these leaf probabilities.
Then from there somehow by magic I would add up all of these probabilities to see what is the probability that two organisms would produce a creature with a dominant allele.
I doubt that this assignment was designed to take hours to solve. What am I thinking too hard?
Update:
Solved this in the most ridiculous brute-force way possible. Just ran thousands of simulated matings and figured out the portion that ended up having a dominant allele, until there was enough precision to pass the assignment.
import random
k = 26
m = 18
n = 25
trials = 0
dominants = 0
while True:
s = ['AA'] * k + ['Aa'] * m + ['aa'] * n
first = random.choice(s)
s.remove(first)
second = random.choice(s)
has_dominant_allele = 'A' in [random.choice(first), random.choice(second)]
trials += 1
if has_dominant_allele:
dominants += 1
print "%.5f" % (dominants / float(trials))
Species with dominant alleles are either AA
or Aa
.
Your total ppopulation (k + n + m
consists of k
(hom
) homozygous dominant organisms with AA
, m
(het
) heterozygous dominant organisms with Aa
and n
(rec
) homozygous recessive organisms with aa
. Each of these can mate with any other.
The probability for organisms with the dominant allele is:
P_dom = n_dominant/n_total or 1 - n_recessive/n_total
Doing the Punnett squares for each of these combinations is not a bad idea:
hom + het
| A | a
-----------
A | AA | Aa
a | Aa | aa
het + rec
| a | a
-----------
A | Aa | Aa
a | aa | aa
Apparently, mating of of two organisms results in four possible children. hom
+ het
yields 1 of 4 organisms with the recessive allele, het
+ rec
yields 2 of 4 organisms with the recessive allele.
You might want to do that for the other combinations as well.
Since we're not just mating the organisms one on one, but throw together a whole k + m + n
bunch, the total number of offspring and the number of 'children' with a particular allele would be nice to know.
If you don't mind a bit of Python, comb
from scipy.misc
might be helpful here. in the calculation, don't forget (a) that you get 4
children from each combination and (b) that you need a factor (from the Punnett squares) to determine the recessive (or dominant) offspring from the combinations.
Update
# total population
pop_total = 4 * comb(hom + het + rec, 2)
# use PUNNETT squares!
# dominant organisms
dom_total = 4*comb(hom,2) + 4*hom*het + 4*hom*rec + 3*comb(het,2) + 2*het*rec
# probability for dominant organisms
phom = dom_total/pop_total
print phom
# probability for dominant organisms +
# probability for recessive organisms should be 1
# let's check that:
rec_total = 4 * comb(rec, 2) + 2*rec*het + comb(het, 2)
prec = totalrec/totalpop
print 1 - prec
这篇关于罗莎琳德(Rosalind)的《孟德尔第一法》知识产权局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!