罗莎琳德(Rosalind)的《孟德尔第一法》知识产权局 [英] Rosalind "Mendel's First Law" IPRB

查看:145
本文介绍了罗莎琳德(Rosalind)的《孟德尔第一法》知识产权局的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了为即将到来的生物信息学课程做准备,我正在从rosalind.info进行一些作业.我目前陷入"门德尔的第一定律"中.

我认为我可以通过这种方式蛮横地对待自己,但是以某种方式我的思想必须太复杂了.我的方法是这样:

构建具有三个级别的概率树.有两个生物配对,生物A和生物B.第一层是,选择纯合子显性(k),杂合子(m)或隐性纯合子(n)作为生物A的概率.例如,对于纯合子优势,似乎总共有(k + m + n)个生物,而其中k个是纯合子优势,则概率为k/(k + m + n).

然后在这棵树中,假设我们知道被拣选的生物A是什么,则生物B的概率为k/m/n.例如,如果将生物A选为杂合(m),则生物B也将是杂合的概率为(m-1)/(k + m + n-1),因为现在还剩下一个杂合较少的生物.

这将给出两个级别的概率,并且要达到这一目的,将涉及大量代码,因为我实际上是在构建树结构,并且对于每个分支,该部分都需要手动编写代码.

现在,选择生物A和B后,它们每个都有两条染色体.这些染色体之一可以随机选择.因此,对于A染色体,可以选择1或2条染色体,对于B染色体,可以选择相同的染色体.因此,有4种不同的选择:选择A的1个,B的1个.选择A的2个,B的1个.选择A的1个,B的2个. A的2个,B的2个.每一个的概率为1/4.所以最后这棵树将具有这些叶概率.

然后从某种程度上通过魔术我将所有这些概率加起来,看看两个生物产生具有优势等位基因的生物的概率是什么.

我怀疑此任务的设计需要花费数小时才能解决.我在想什么呢?

更新:

以最荒谬的蛮力方式解决了这个问题.只需运行数千个模拟交配,然后找出最终具有显性等位基因的那一部分,直到有足够的精度来通过任务即可.

import random
k = 26
m = 18
n = 25

trials = 0
dominants = 0

while True:
    s = ['AA'] * k + ['Aa'] * m + ['aa'] * n
    first = random.choice(s)
    s.remove(first)
    second = random.choice(s)
    has_dominant_allele = 'A' in [random.choice(first), random.choice(second)]
    trials += 1
    if has_dominant_allele:
        dominants += 1
    print "%.5f" % (dominants / float(trials))

解决方案

等位基因占优势的物种是AAAa.

您的总种群(k + n + m由具有AAk(hom)纯合子优势菌,具有Aan的杂合子优势菌()具有aa的纯合隐性生物,它们每个都可以彼此交配.

具有优势等位基因的生物的概率为:

P_dom = n_dominant/n_total or 1 - n_recessive/n_total

对这些组合中的每一个进行Punnett平方不是一个坏主意:

  hom + het

  |  A | a
-----------
A | AA | Aa
a | Aa | aa


  het + rec

  |  a | a
-----------
A | Aa | Aa
a | aa | aa

显然,两种生物的交配会导致四个可能的孩子. hom + het产生隐性等位基因的4种生物中的1个,het + rec产生隐性等位基因的4种生物中的2个.

对于其他组合,您可能也想这样做.

由于我们不仅要使生物体一对一地交配,而且要把整个k + m + n串在一起,所以很高兴知道后代的总数和带有特定等位基因的子代"的数量.

如果您不介意使用Python,可以在这里使用scipy.misc中的comb.在计算中,不要忘记(a)从每个组合中获得4个孩子,以及(b)需要一个因子(从Punnett平方中得出)来确定组合中的隐性(或显性)后代./p>

更新

    # total population
    pop_total = 4 * comb(hom + het + rec, 2)

    # use PUNNETT squares!

    # dominant organisms         
    dom_total = 4*comb(hom,2) + 4*hom*het + 4*hom*rec + 3*comb(het,2) + 2*het*rec

    # probability for dominant organisms
    phom = dom_total/pop_total
    print phom

    # probability for dominant organisms + 
    # probability for recessive organisms should be 1
    # let's check that:
    rec_total = 4 * comb(rec, 2) + 2*rec*het + comb(het, 2)
    prec = totalrec/totalpop
    print 1 - prec

As preparation for an upcoming bioinformatics course, I am doing some assignments from rosalind.info. I am currently stuck in the assignment "Mendel's First Law".

I think I could brute force myself through this, but that somehow my thinking must be too convoluted. My approach would be this:

Build a tree of probabilities which has three levels. There are two creatures that mate, creature A and creature B. First level is, what is the probability for picking as creature A homozygous dominant (k), heterozygous (m) or homozygous recessive (n). It seems that for example for homozygous dominant, since there are a total of (k+m+n) creatures and k of them are homozygous dominant, the probability is k/(k+m+n).

Then in this tree, under each of these would come the probability of creature B being k / m / n given that we know what creature A got picked as. For example if creature A was picked to be heterozygous (m), then the probability that creature B would also be heterozygous is (m-1)/(k+m+n-1) because there is now one less heterozygous creature left.

This would give the two levels of probabilities, and would involve a lot of code just to get this far, as I would literally be building a tree structure and for each branch have manually written code for that part.

Now after choosing creatures A and B, each of them has two chromosomes. One of these chromosomes can randomly be picked. So for A chromosome 1 or 2 can be picked and same for B. So there are 4 different options: pick 1 of A, 1 of B. Pick 2 of A, 1 of B. Pick 1 of A, 2 of B. Pick 2 of A, 2 of B. The probability of each of these would be 1/4. So finally this tree would have these leaf probabilities.

Then from there somehow by magic I would add up all of these probabilities to see what is the probability that two organisms would produce a creature with a dominant allele.

I doubt that this assignment was designed to take hours to solve. What am I thinking too hard?

Update:

Solved this in the most ridiculous brute-force way possible. Just ran thousands of simulated matings and figured out the portion that ended up having a dominant allele, until there was enough precision to pass the assignment.

import random
k = 26
m = 18
n = 25

trials = 0
dominants = 0

while True:
    s = ['AA'] * k + ['Aa'] * m + ['aa'] * n
    first = random.choice(s)
    s.remove(first)
    second = random.choice(s)
    has_dominant_allele = 'A' in [random.choice(first), random.choice(second)]
    trials += 1
    if has_dominant_allele:
        dominants += 1
    print "%.5f" % (dominants / float(trials))

解决方案

Species with dominant alleles are either AA or Aa.

Your total ppopulation (k + n + m consists of k (hom) homozygous dominant organisms with AA, m (het) heterozygous dominant organisms with Aa and n (rec) homozygous recessive organisms with aa. Each of these can mate with any other.

The probability for organisms with the dominant allele is:

P_dom = n_dominant/n_total or 1 - n_recessive/n_total

Doing the Punnett squares for each of these combinations is not a bad idea:

  hom + het

  |  A | a
-----------
A | AA | Aa
a | Aa | aa


  het + rec

  |  a | a
-----------
A | Aa | Aa
a | aa | aa

Apparently, mating of of two organisms results in four possible children. hom + het yields 1 of 4 organisms with the recessive allele, het + rec yields 2 of 4 organisms with the recessive allele.

You might want to do that for the other combinations as well.

Since we're not just mating the organisms one on one, but throw together a whole k + m + n bunch, the total number of offspring and the number of 'children' with a particular allele would be nice to know.

If you don't mind a bit of Python, comb from scipy.misc might be helpful here. in the calculation, don't forget (a) that you get 4 children from each combination and (b) that you need a factor (from the Punnett squares) to determine the recessive (or dominant) offspring from the combinations.

Update

    # total population
    pop_total = 4 * comb(hom + het + rec, 2)

    # use PUNNETT squares!

    # dominant organisms         
    dom_total = 4*comb(hom,2) + 4*hom*het + 4*hom*rec + 3*comb(het,2) + 2*het*rec

    # probability for dominant organisms
    phom = dom_total/pop_total
    print phom

    # probability for dominant organisms + 
    # probability for recessive organisms should be 1
    # let's check that:
    rec_total = 4 * comb(rec, 2) + 2*rec*het + comb(het, 2)
    prec = totalrec/totalpop
    print 1 - prec

这篇关于罗莎琳德(Rosalind)的《孟德尔第一法》知识产权局的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆