AttributeError:"numpy.ndarray"对象没有属性"split" [英] AttributeError: 'numpy.ndarray' object has no attribute 'split'

查看:3183
本文介绍了AttributeError:"numpy.ndarray"对象没有属性"split"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试回答以下问题 一位同事在每行上生成了一个带有一个DNA序列的文件.下载该文件,然后使用numpy.loadtxt()将其加载到Python中.您将需要使用可选参数dtype = str来告诉loadtxt()数据是由字符串组成.

I am trying to answer the following question "A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using numpy.loadtxt(). You will need to use the optional argument dtype=str to tell loadtxt() that the data is composed of strings.

计算每个序列的GC含量. GC含量是G或C碱基的百分比(占总碱基对的百分比).将每个序列的结果打印为序列的GC含量为XX.XX%",其中XX.XX是实际的GC含量.使用格式化的字符串"执行此操作. "

Calculate the GC content of each sequence. The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs). Print the result for each sequence as "The GC content of the sequence is XX.XX%" where XX.XX is the actual GC content. Do this using a "formatted strings". "

已经导入了dna序列文件并将它们连接在一起,现在我想将字符串分成5个序列(对应于5行中的每行),然后开始计算...

Having imported the file of dna sequences and joining these together I now want to split the string into 5 sequences (corresponding to each of the 5 rows) to then start the calculations...

这是我的代码:

import numpy
dna_data=numpy.loadtxt("dna_sequences",dtype=str)
",".join(dna_data)
seq1,seq2,seq3,seq4,seq5=dna_data.split(",",4)

我收到此错误消息: AttributeError:'numpy.ndarray'对象没有属性'split'

I am getting this error message: AttributeError: 'numpy.ndarray' object has no attribute 'split'

请帮助!!!

推荐答案

正如注释中所说:",".join(dna_data)不会修改dna_data,它只是返回一个必须存储在其他变量中的字符串.像这样:

As it was said in the comments : ",".join(dna_data) does not modify dna_data , it just returns a string that you have to store in an other variable. Like this :

s = ",".join(dna_data)
seq1,seq2,seq3,seq4,seq5=s.split(",",4)

进一步:

(请注意,因为您似乎是numpy的新手,下面我将假设dna_data具有形状(5,),如果不是这种情况,则可以使用

(Note as you seem to be new to numpy: In the following I'll assume dna_data has a shape (5,) if it is not the case, you can get back to that shape using very basic slicing )

也就是说,使用该代码,您只是将数组变成一个列表,然后放入5个不同的变量,因此当过多时使用array-> string-> list-> variables您可以只用简单的一行输入array-> variables:seq1,seq2,seq3,seq4,seq5 = dna_data.

That being said, with that code, you are just turning your array into a list to then put in 5 different variables so going array->string->list->variables is very excessive when you could just go array->variables in one trivial line : seq1,seq2,seq3,seq4,seq5 = dna_data.

我会走得更远:根本不做!仅使用dna_data[n]代替任何seq*变量时具有几个变量的意义何在?前者更方便,可以轻松地执行诸如使用for循环遍历所有序列的操作.例如:

And I would go even further : don't do it at all ! What is the point of having several variables when you just can just use dna_data[n] instead of any of your seq* variables ? The former is more convenient and allows to painlessly do things such as looping over all the sequence with for-loops. eg:

for seq in dna_data: 
    print(seq)

这篇关于AttributeError:"numpy.ndarray"对象没有属性"split"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆