sklearn 具有多个元素的数组的真值是不明确的.使用 a.any() 或 a.all() 错误 [英] Sklearn the truth value of an array with more than one element is ambiguous. Use a.any() or a.all() error

查看:120
本文介绍了sklearn 具有多个元素的数组的真值是不明确的.使用 a.any() 或 a.all() 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在网上看到的训练数据集上的代码,但似乎无法解决上述错误.

I'm trying out a code on training datasets that I saw online, but can't seem to resolve the error as mentioned.

当我第一次运行代码时,我得到了上述错误:

When I first ran the code, I get the above error as such:

ValueError  Traceback (most recent call last)
----> 2 knn_cv.fit(X_train, y_train)
<ipython-input-21-fb975450c609> in fit(self, X, y)
214         X = normalize(X, norm='l1', copy=False)
215 
--> 216         cv = check_cv(self.cv, X, y)
/anaconda3/lib/python3.6/site-packages/sklearn/model_selection/_split.py in 
check_cv(cv, y, classifier)
1980 
1981     if isinstance(cv, numbers.Integral):
-> 1982         if (classifier and (y is not None) and
1983                 (type_of_target(y) in ('binary', 'multiclass'))):
1984             return StratifiedKFold(cv)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

错误似乎在函数 check_cv 中,看起来 y_train 正在抛出布尔值,但我不确定如何修改它.我知道原因是and"语句,它通常是可修改的,但在这种情况下,错误存在于 check_cv 函数中,我不确定如何修改该语句.我尝试了使用 a.any() 或 a.all() 的建议操作,但每次都会抛出一个新错误.

The error seems to be in the function check_cv and looks like y_train is throwing the boolean, but I'm not exactly sure how to modify it. I know the cause is the 'and' statement which is usually modifiable but in this case the error resides within the check_cv function and I'm not sure on how to modify the statement. I tried the suggested action which was using a.any() or a.all() but it throws me a new error each time.

如果我使用y_train.any() 它给了我一个错误:

if I use y_train.any() it gives me an error:

 269     if y.ndim > 2 or (y.dtype == object and len(y) and
    270                       not isinstance(y.flat[0], str)):
--> 271         return 'unknown'  # [[[1, 2]]] or [obj_1] and not 
["label_1"]
    272 
    273     if y.ndim == 2 and y.shape[1] == 0:

TypeError: len() of unsized object

如果我使用 y_train.all(),它说类型错误:'KFold' 对象不可迭代

if I use y_train.all(), it says TypeError: 'KFold' object is not iterable

另一个查询建议将数组更改为列表,但它给了我np.array(y_train).tolist()
结果:TypeError: len() of unsized object

Another query suggested changing the array to a list, but it gives me np.array(y_train).tolist()
result: TypeError: len() of unsized object

也更新了 sklearn,但似乎没有修复错误.希望有人能解释什么是错的或者我如何修改代码(如果可能的话也解释一下.这部分代码我还是有点陌生​​)

Updated sklearn as well but doesn't seem to fix the error. Hoping someone can explain what's wrong or how I can modify the code (explanation as well if possible. I'm still a little unfamiliar with this part of the code)

使用 GoogleNews-vectors-negative300.bin.gz 创建的训练样本

training sample created using GoogleNews-vectors-negative300.bin.gz

y_train = array([ 3, 17, 14, 14, 5, 13,... 0, 1, 17, 16, 2])

y_train = array([ 3, 17, 14, 14, 5, 13,... 0, 1, 17, 16, 2])

y_train.shape() = (100,)

y_train.shape() = (100,)

X_train = <100x5100 '' 类型的稀疏矩阵以压缩稀疏行格式存储 10049 个元素>

X_train = <100x5100 sparse matrix of type '' with 10049 stored elements in Compressed Sparse Row format>

X = check_array(X_train, accept_sparse='csr', copy=True)
print(X)
(0, 679)    1.0
(0, 701)    1.0
(0, 1851)   2.0
(0, 1889)   1.0
(0, 2498)   1.0
(0, 2539)   1.0
(0, 2589)   1.0
(0, 2679)   1.0...

 X.shape =  (100, 5100)

我附上了代码的主要部分,如果你需要对整个事情的参考,我提供了下面的链接http://vene.ro/blog/word-movers-distance-in-python.html

I attached the main part of the code, if you need a reference to the whole thing, I've provided the link below http://vene.ro/blog/word-movers-distance-in-python.html

def fit(self, X, y):
    if self.n_neighbors_try is None:
        n_neighbors_try = range(1, 6)
    else:
        n_neighbors_try = self.n_neighbors_try

    X = check_array(X, accept_sparse='csr', copy=True)
    X = normalize(X, norm='l1', copy=False)

    cv = check_cv(self.cv, X, y)
    knn = KNeighborsClassifier(metric='precomputed', algorithm='brute')
    scorer = check_scoring(knn, scoring=self.scoring)

    scores = []
    for train_ix, test_ix in cv:
        dist = self._pairwise_wmd(X[test_ix], X[train_ix])
        knn.fit(X[train_ix], y[train_ix])
        scores.append([
            scorer(knn.set_params(n_neighbors=k), dist, y[test_ix])
            for k in n_neighbors_try
        ])
    scores = np.array(scores)
    self.cv_scores_ = scores

    best_k_ix = np.argmax(np.mean(scores, axis=0))
    best_k = n_neighbors_try[best_k_ix]
    self.n_neighbors = self.n_neighbors_ = best_k

    return super(WordMoversKNNCV, self).fit(X, y)

 knn_cv = WordMoversKNNCV(cv=3,n_neighbors_try=range(1, 20), 
 W_embed=W_common, verbose=5, n_jobs=3)
 knn_cv.fit(X_train, y_train.all())

根据作者的说法,我应该得到这个:

according to the author, I'm supposed to get this :

[Parallel(n_jobs=3)]: Done  12 tasks      | elapsed:   30.8s

[Parallel(n_jobs=3)]: Done  34 out of  34 | elapsed:  2.0min finished

[Parallel(n_jobs=3)]: Done  12 tasks      | elapsed:   25.7s

[Parallel(n_jobs=3)]: Done  33 out of  33 | elapsed:  2.9min finished

[Parallel(n_jobs=3)]: Done  12 tasks      | elapsed:   53.3s

[Parallel(n_jobs=3)]: Done  33 out of  33 | elapsed:  2.0min finished

WordMoversKNNCV(W_embed=memmap([[ 0.04283, -0.01124, ..., -0.05679, -0.00763],
       [ 0.02884, -0.05923, ..., -0.04744,  0.06698],
   ...,
       [ 0.08428, -0.15534, ..., -0.01413,  0.04561],
       [-0.02052,  0.08666, ...,  0.03659,  0.10445]]),
    cv=3, n_jobs=3, n_neighbors_try=range(1, 20), scoring=None,
    verbose=5)

推荐答案

您使用的 check_cv 错误.根据文档:-

You are using check_cv wrong. According to the documentation:-

check_cv(cv=’warn’, y=None, classifier=False):

cv : int, 
     cross-validation generator or an iterable, optional

y : array-like, optional
    The target variable for supervised learning problems.

classifier : boolean, optional, default False
             Whether the task is a classification task, 
             in which case stratified KFold will be used

所以它需要 yestimator 输入.但是您提供的 Xy 是错误的.更改以下几行:

So it wants y and estimator in input. But you are providing X and y which is wrong. Change the below lines:

cv = check_cv(self.cv, X, y)
knn = KNeighborsClassifier(metric='precomputed', algorithm='brute')

到:

knn = KNeighborsClassifier(metric='precomputed', algorithm='brute')
cv = check_cv(self.cv, y, knn)

注意行的顺序.

这篇关于sklearn 具有多个元素的数组的真值是不明确的.使用 a.any() 或 a.all() 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆