Python-使用Numpy,ValueError生成随机dna序列 [英] Python - Generating random dna sequences with Numpy, ValueError

查看:210
本文介绍了Python-使用Numpy,ValueError生成随机dna序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问任何熟悉numpy的人两个问题.我见过非常相似的问题(和答案),但是我都不想使用numpy,因为它提供了很多其他选项,将来我可能希望在该代码中使用. 我试图在python中使用"random"生成随机核苷酸序列的列表.因为我想拥有非均匀概率,所以我决定改用numpy.但是,我收到错误消息:"ValueError:必须为一维或整数".

there are two questions i would like to ask anybody that is familiar with numpy. i have seen very similar questions (and answers) but none of those used numpy which i would like to use since it offers a lot of other options i might want to use within that code in the future. i have tried to generate a list of random nucleotide sequences using "random" in python. since i wanted to have non-uniform probabilities i decided to use numpy instead. however, i get the error message: "ValueError: a must be 1-dimensional or an integer".

import numpy as np

def random_dna_sequence(length):
    return ''.join(np.random.choice('ACTG') for _ in range(length))

with open('dna.txt', 'w+') as txtout:
    for _ in range(10):
        dna = random_dna_sequence(100)
        txtout.write(dna)
        txtout.write("\n")

        print (dna)

我是一个彻底的磨练者,我不知道多维在哪里发挥作用或如何发挥作用.我怀疑是".join()",但我不确定,也不确定如何替换它. 我的另一个问题是如何获得非均匀概率.我尝试使用"np.random.choice('ACTG',p = 0.2,0.2,0.3,0.3)",但是它不起作用.

i'm a complete scrub and i can't figure out where or how multidimensionality comes into play. i suspect ".join()" but i'm not sure and also unsure how i could replace it. my other question is how to get non-uniform probability. i tried with "np.random.choice('ACTG', p=0.2, 0.2, 0.3, 0.3)" but it doesn't work.

我希望那里有人可以提供帮助.预先感谢.

i hope there is somebody out there that can help. thanks in advance.

问候, 伯特

推荐答案

对于问题的第一部分,将a作为列表传递:

For the first part of your question, pass a as a list:

def random_dna_sequence(length):
    return ''.join(np.random.choice(list('ACTG')) for _ in range(length))

或将您的碱基定义为列表或元组:

Or define your bases as a list or tuple:

BASES = ('A', 'C', 'T', 'G')

def random_dna_sequence(length):
    return ''.join(np.random.choice(BASES) for _ in range(length))

第二部分有一个类似的解决方案:将概率作为列表或元组传递:

The second part has a similar solution: pass the probabilities as a list or tuple:

BASES = ('A', 'C', 'T', 'G')
P = (0.2, 0.2, 0.3, 0.3)

def random_dna_sequence(length):
    return ''.join(np.random.choice(BASES, p=P) for _ in range(length))

这篇关于Python-使用Numpy,ValueError生成随机dna序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆