如何比较使用scikit-learn库load_svmlight_file存储的2个稀疏矩阵? [英] How to compare 2 sparse matrix stored using scikit-learn library load_svmlight_file?
问题描述
为什么会出现此错误? 我该如何解决?
提前谢谢!
from sklearn.datasets import load_svmlight_file
pathToTrainData="../train.txt"
pathToTestData="../test.txt"
X_train,Y_train= load_svmlight_file(pathToTrainData);
X_test,Y_test= load_svmlight_file(pathToTestData);
for ele1 in X_train:
for ele2 in X_test:
if(ele1==ele2):
print "same vector"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-c1f145f984a6> in <module>()
7 for ele1 in X_train:
8 for ele2 in X_test:
----> 9 if(ele1==ele2):
10 print "same vector"
/Users/rkasat/anaconda/lib/python2.7/site-packages/scipy/sparse/base.pyc in __bool__(self)
181 return True if self.nnz == 1 else False
182 else:
--> 183 raise ValueError("The truth value of an array with more than one "
184 "element is ambiguous. Use a.any() or a.all().")
185 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
您可以使用此条件来检查两个稀疏数组是否完全相等,而无需对其进行致密化:
if (ele1 - ele2).nnz == 0:
# Matched, do something ...
nnz
属性给出稀疏数组中非零元素的数量.
一些简单的测试可以显示出差异:
import numpy as np
from scipy import sparse
A = sparse.rand(10, 1000000).tocsr()
def benchmark1(A):
for s1 in A:
for s2 in A:
if (s1 - s2).nnz == 0:
pass
def benchmark2(A):
for s1 in A:
for s2 in A:
if (s1.toarray() == s2).all() == 0:
pass
%timeit benchmark1(A)
%timeit benchmark2(A)
一些结果:
# Computer 1
10 loops, best of 3: 36.9 ms per loop # with nnz
1 loops, best of 3: 734 ms per loop # with toarray
# Computer 2
10 loops, best of 3: 28 ms per loop
1 loops, best of 3: 312 ms per loop
i am trying to compare feature vectors present in test and train data set.These feature vectors are stored in sparse format using scikitlearn library load_svmlight_file.The dimension of feature vectors of both the dataset is same.However,I am getting this error :"The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()."
Why am I getting this error? How can I resolve it?
Thanks in advance!
from sklearn.datasets import load_svmlight_file
pathToTrainData="../train.txt"
pathToTestData="../test.txt"
X_train,Y_train= load_svmlight_file(pathToTrainData);
X_test,Y_test= load_svmlight_file(pathToTestData);
for ele1 in X_train:
for ele2 in X_test:
if(ele1==ele2):
print "same vector"
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-c1f145f984a6> in <module>()
7 for ele1 in X_train:
8 for ele2 in X_test:
----> 9 if(ele1==ele2):
10 print "same vector"
/Users/rkasat/anaconda/lib/python2.7/site-packages/scipy/sparse/base.pyc in __bool__(self)
181 return True if self.nnz == 1 else False
182 else:
--> 183 raise ValueError("The truth value of an array with more than one "
184 "element is ambiguous. Use a.any() or a.all().")
185 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
You can use this condition to check whether the two sparse arrays are exactly equal without needing to densify them:
if (ele1 - ele2).nnz == 0:
# Matched, do something ...
The nnz
attribute gives the number of nonzero elements in the sparse array.
Some simple test runs to show the difference:
import numpy as np
from scipy import sparse
A = sparse.rand(10, 1000000).tocsr()
def benchmark1(A):
for s1 in A:
for s2 in A:
if (s1 - s2).nnz == 0:
pass
def benchmark2(A):
for s1 in A:
for s2 in A:
if (s1.toarray() == s2).all() == 0:
pass
%timeit benchmark1(A)
%timeit benchmark2(A)
Some results:
# Computer 1
10 loops, best of 3: 36.9 ms per loop # with nnz
1 loops, best of 3: 734 ms per loop # with toarray
# Computer 2
10 loops, best of 3: 28 ms per loop
1 loops, best of 3: 312 ms per loop
这篇关于如何比较使用scikit-learn库load_svmlight_file存储的2个稀疏矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!