将数据分为3类的最佳方法 [英] Best way to separate data into 3 classes
问题描述
我有一个numpy数组
I have a numpy array as
[['6.5' '3.2' '5.1' '2.0' 'Iris-virginica']
['6.1' '2.8' '4.0' '1.3' 'Iris-versicolor']
['4.6' '3.2' '1.4' '0.2' 'Iris-setosa']
['6.0' '2.2' '4.0' '1.0' 'Iris-versicolor']
['4.7' '3.2' '1.3' '0.2' 'Iris-setosa']
['6.7' '3.1' '5.6' '2.4' 'Iris-virginica']]
将数据基于标签'Iris-virginica'
,'Iris-setosa'
和'Iris-virginica'
分离为3个单独的numpy数组的最快方法是什么,以便
What would be the fastest way to separate this data into 3 separate numpy arrays based on the label 'Iris-virginica'
, 'Iris-setosa'
and 'Iris-virginica'
so that
Iris-virginica
数组仅包含
[['6.5' '3.2' '5.1' '2.0']['6.7' '3.1' '5.6' '2.4']]
Iris-setosa
数组仅包含[['4.6' '3.2' '1.4' '0.2'] ['4.7' '3.2' '1.3' '0.2']]
Iris-versicolor
数组仅包含[['6.1' '2.8' '4.0' '1.3']['6.0' '2.2' '4.0' '1.0']]
推荐答案
使用numpy
并列出comprehension
,
import numpy as np
data = [['6.5', '3.2', '5.1', '2.0', 'Iris-virginica'],
['6.1', '2.8', '4.0', '1.3', 'Iris-versicolor'] ,
['4.6', '3.2', '1.4', '0.2', 'Iris-setosa'],
['6.0', '2.2', '4.0', '1.0', 'Iris-versicolor'],
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa'],
['6.7', '3.1', '5.6', '2.4', 'Iris-virginica']]
filtered = [map(float, item[:4]) for item in data if item[4] == 'Iris-virginica']
print 'mean', np.mean(filtered, axis=0)
print 'var ', np.var(filtered, axis=0)
其中item[4] == 'Iris-virginica'
过滤所需的内容,而map(float, item[:3])
表示str
至float
,然后np.mean(..., axis=0)
表示要获取mean
过滤后的数据.
where item[4] == 'Iris-virginica'
filters what you want, and map(float, item[:3])
is for str
to float
, then np.mean(..., axis=0)
is to get mean
of the filtered data.
输出为
mean [ 6.6 3.15 5.35]
var [ 0.01 0.0025 0.0625]
更新
这是仅numpy
版本,但这似乎比上面的版本慢.
Here is numpy
only version, but this seems like slower than the above.
data = np.array(data)
filtered = data[data[:, 4] == 'Iris-virginica'][:, :3].astype(np.float)
print 'mean', np.mean(filtered, axis=0)
print 'var ', np.var(filtered, axis=0)
timeit
结果是
In [5]: %timeit filtered = [map(float, item[:4]) for item in data if item[4] == 'Iris-virginica']
100000 loops, best of 3: 1.93 µs per loop
In [6]: data = np.array(data)
In [7]: timeit data[data[:, 4] == 'Iris-virginica'][:, :4].astype(np.float)
100000 loops, best of 3: 15.5 µs per loop
这篇关于将数据分为3类的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!