AttributeError:"numpy.ndarray"对象没有属性"split" [英] AttributeError: 'numpy.ndarray' object has no attribute 'split'
问题描述
我正在尝试回答以下问题 一位同事在每行上生成了一个带有一个DNA序列的文件.下载该文件,然后使用numpy.loadtxt()将其加载到Python中.您将需要使用可选参数dtype = str来告诉loadtxt()数据是由字符串组成.
I am trying to answer the following question "A colleague has produced a file with one DNA sequence on each line. Download the file and load it into Python using numpy.loadtxt(). You will need to use the optional argument dtype=str to tell loadtxt() that the data is composed of strings.
计算每个序列的GC含量. GC含量是G或C碱基的百分比(占总碱基对的百分比).将每个序列的结果打印为序列的GC含量为XX.XX%",其中XX.XX是实际的GC含量.使用格式化的字符串"执行此操作. "
Calculate the GC content of each sequence. The GC content is the percentage of bases that are either G or C (as a percentage of total base pairs). Print the result for each sequence as "The GC content of the sequence is XX.XX%" where XX.XX is the actual GC content. Do this using a "formatted strings". "
已经导入了dna序列文件并将它们连接在一起,现在我想将字符串分成5个序列(对应于5行中的每行),然后开始计算...
Having imported the file of dna sequences and joining these together I now want to split the string into 5 sequences (corresponding to each of the 5 rows) to then start the calculations...
这是我的代码:
import numpy
dna_data=numpy.loadtxt("dna_sequences",dtype=str)
",".join(dna_data)
seq1,seq2,seq3,seq4,seq5=dna_data.split(",",4)
我收到此错误消息: AttributeError:'numpy.ndarray'对象没有属性'split'
I am getting this error message: AttributeError: 'numpy.ndarray' object has no attribute 'split'
请帮助!!!
推荐答案
正如注释中所说:",".join(dna_data)
不会修改dna_data
,它只是返回一个必须存储在其他变量中的字符串.像这样:
As it was said in the comments : ",".join(dna_data)
does not modify dna_data
, it just returns a string that you have to store in an other variable. Like this :
s = ",".join(dna_data)
seq1,seq2,seq3,seq4,seq5=s.split(",",4)
进一步:
(请注意,因为您似乎是numpy的新手,下面我将假设dna_data
具有形状(5,)
,如果不是这种情况,则可以使用
(Note as you seem to be new to numpy: In the following I'll assume dna_data
has a shape (5,)
if it is not the case, you can get back to that shape using very basic slicing )
也就是说,使用该代码,您只是将数组变成一个列表,然后放入5个不同的变量,因此当过多时使用array-> string-> list-> variables您可以只用简单的一行输入array-> variables:seq1,seq2,seq3,seq4,seq5 = dna_data
.
That being said, with that code, you are just turning your array into a list to then put in 5 different variables so going array->string->list->variables is very excessive when you could just go array->variables in one trivial line : seq1,seq2,seq3,seq4,seq5 = dna_data
.
我会走得更远:根本不做!仅使用dna_data[n]
代替任何seq*
变量时具有几个变量的意义何在?前者更方便,可以轻松地执行诸如使用for循环遍历所有序列的操作.例如:
And I would go even further : don't do it at all ! What is the point of having several variables when you just can just use dna_data[n]
instead of any of your seq*
variables ? The former is more convenient and allows to painlessly do things such as looping over all the sequence with for-loops. eg:
for seq in dna_data:
print(seq)
这篇关于AttributeError:"numpy.ndarray"对象没有属性"split"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!