如何使用python处理来自arff文件的数据? [英] How to deal with data from arff file with python?
问题描述
我对于python来说还是很新的.我现在正在使用python读取arff文件:
I am pretty new for python. I am using python to read the arff file now:
import arff
for row in arff.load('cpu.arff'):
x = row
print(x)
示例输出的一部分类似于以下格式:
The part of sample output is like this format:
<Row(125.0,256.0,6000.0,256.0,16.0,128.0,198.0)>
<Row(29.0,8000.0,32000.0,32.0,8.0,32.0,269.0)>
<Row(29.0,8000.0,32000.0,32.0,8.0,32.0,220.0)>
<Row(29.0,8000.0,32000.0,32.0,8.0,32.0,172.0)>
<Row(29.0,8000.0,16000.0,32.0,8.0,16.0,132.0)>
<Row(26.0,8000.0,32000.0,64.0,8.0,32.0,318.0)>
<Row(23.0,16000.0,32000.0,64.0,16.0,32.0,367.0)>
实际上,只有数据的最后一列是标签,其余数据是属性.我想知道如何通过使用数组保存它们? 因为我想将最后一列的数据分配为y,将前六列的数据分配为我的x,然后我将对arff文件中的数据进行交叉验证.
Actually, only the last column of data is the label, and the rest of data are the attributes. I am wondering how I can save them by using array? Because I want to assign the data of last column as y, and the first six column data as my x, and then I will do the cross-validation for the data from arff file.
或者是否有任何方法可以通过属性和标签自动从arff文件中分离数据?
Or is there any approaches to separate data by attributes and label from arff file automatically?
推荐答案
arff
模块中的行对象支持典型的python数组切片,因此您可以轻松地将数据与标签分开
Row objects from arff
module support typical python array slicing, thus you can separate data from labels easily
import arff
X = []
y = []
for row in arff.load('cpu.arff'):
X.append(row[:-1])
y.append(row[-1])
这篇关于如何使用python处理来自arff文件的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!