找到具有 0 个样本的数组(形状=(0, 40)),而最少需要 1 个 [英] Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required

查看:55
本文介绍了找到具有 0 个样本的数组(形状=(0, 40)),而最少需要 1 个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Python 2.7、sklearn 0.17.1、numpy 1.11.0 测试一个简单的预测程序.我从 LDA 模型中得到了具有概率的矩阵,现在我想创建 RandomForestClassifier 来通过概率预测结果.我的代码是:

I'm testing a simple prediction program with Python 2.7, sklearn 0.17.1, numpy 1.11.0. I got matrix with propabilities from LDA model, and now I want create RandomForestClassifier to predict results by propabilities. My code is:

maxlen = 40
props = []
for doc in corpus:
    topics = model.get_document_topics(doc) 
    tprops = [0] * maxlen
    for topic in topics:
        tprops[topics[0]] = topics[1]
    props.append(tprops)

ntheta = np.array(props)
ny = np.array(y)

clf = RandomForestClassifier(n_estimators=100)
accuracy = cross_val_score(clf, ntheta, ny, scoring = 'accuracy')
print accuracy

<小时>

ValueError                                Traceback (most recent call last)
<ipython-input-65-a7d276df43e9> in <module>()
      1 # clf.fit(nteta, ny)
      2 print nteta.shape, ny.shape
----> 3 accuracy = cross_val_score(clf, nteta, ny, scoring = 'accuracy')
      4 print accuracy

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1431                                               train, test, verbose, None,
   1432                                               fit_params)
-> 1433                       for train, test in cv)
   1434     return np.array(scores)[:, 0]
   1435 

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
    798             # was dispatched. In particular this covers the edge
    799             # case of Parallel used with an exhausted iterator.
--> 800             while self.dispatch_one_batch(iterator):
    801                 self._iterating = True
    802             else:

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
    656                 return False
    657             else:
--> 658                 self._dispatch(tasks)
    659                 return True
    660 

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
    564 
    565         if self._pool is None:
--> 566             job = ImmediateComputeBatch(batch)
    567             self._jobs.append(job)
    568             self.n_dispatched_batches += 1

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, batch)
    178         # Don't delay the application, to avoid keeping the input
    179         # arguments in memory
--> 180         self.results = batch()
    181 
    182     def get(self):

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
     73 
     74     def __len__(self):

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
   1529             estimator.fit(X_train, **fit_params)
   1530         else:
-> 1531             estimator.fit(X_train, y_train, **fit_params)
   1532 
   1533     except Exception as e:

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.pyc in fit(self, X, y, sample_weight)
    210         """
    211         # Validate or convert input data
--> 212         X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    213         if issparse(X):
    214             # Pre-sort indices to avoid that each individual tree of the

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    405                              " minimum of %d is required%s."
    406                              % (n_samples, shape_repr, ensure_min_samples,
--> 407                                 context))
    408 
    409     if ensure_min_features > 0 and array.ndim == 2:

ValueError: Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required.

<小时>

更新因为我得到了 2 减?让批评家有建设性.


UPD For what I got 2 minus? Let critic be constructive.

更新

cotique 发现 y 填写不正确(必须是其他类).如果 y 填写正确,则问题不会发生.在我的例子中,类是错误的,它们的数量是 39774.但理论上这不是一个答案,为什么当我们有 39774 个类并且必须预测它们时会发生错误.

cotique found that y was filled incorrect (must be other classes). And if y fills correct then the problem doesn't happens. In my case classes were wrong and their count were 39774. But in theory it's not an answer, why the error happens when we have 39774 classes and have to predict them.

推荐答案

这是来自 scikit-learn repo (validation.py#L409):

This is the original code from the scikit-learn repo (validation.py#L409):

if ensure_min_samples > 0:
   n_samples = _num_samples(array)
   if n_samples < ensure_min_samples:
      raise ValueError("Found array with %d sample(s) (shape=%s) while a"
                       " minimum of %d is required%s."
                        % (n_samples, shape_repr, ensure_min_samples,
                        context))

所以,n_samples = _num_samples(array).顺便说一下,array要检查/转换的输入对象.

So, the n_samples = _num_samples(array). By the way, array is the input object to check / convert.

接下来,validation.py#L111:

def _num_samples(x):
    """Return number of samples in array-like x."""
    if hasattr(x, 'fit'):
        # stuff
    if not hasattr(x, '__len__') and not hasattr(x, 'shape'):
        # stuff
    if hasattr(x, 'shape'):
        if len(x.shape) == 0:
            # raise TypeError
        return x.shape[0]
    else:
        return len(x)

因此,样本数等于array的第一维长度,即0,因为array.shape = (0, 40).

So, the number of samples equals to the length of first dimension of array, which is 0 since array.shape = (0, 40).

我不知道这一切意味着什么,但我希望它能让事情更清楚.

And I don't know what this all means, but I hope it makes things clearer.

这篇关于找到具有 0 个样本的数组(形状=(0, 40)),而最少需要 1 个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆