scikit-learn错误:y中人口最少的类只有1个成员 [英] scikit-learn error: The least populated class in y has only 1 member

查看：2939 发布时间：2020/7/11 18:56:35 python scikit-learn train-test-split

本文介绍了scikit-learn错误:y中人口最少的类只有1个成员的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 train_test_split 函数，但出现此错误:

I'm trying to split my dataset into a training and a test set by using the train_test_split function from scikit-learn, but I'm getting this error:

In [1]: y.iloc[:,0].value_counts()
Out[1]: 
M2    38
M1    35
M4    29
M5    15
M0    15
M3    15

In [2]: xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=1/3, random_state=85, stratify=y)
Out[2]: 
Traceback (most recent call last):
  File "run_ok.py", line 48, in <module>
    xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=1/3,random_state=85,stratify=y)
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1700, in train_test_split
    train, test = next(cv.split(X=arrays[0], y=stratify))
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 953, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1259, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

但是，所有类至少有15个样本.为什么会出现此错误?

However, all classes have at least 15 samples. Why am I getting this error?

X是代表数据点的pandas DataFrame，y是具有一列包含目标变量的pandas DataFrame.

X is a pandas DataFrame which represents the data points, y is a pandas DataFrame with one column that contains the target variable.

我无法发布原始数据，因为它是专有数据，但是通过创建具有1k行x 500列的随机pandas DataFrame(X)和具有相同行数(1k)的随机pandas DataFrame(y)，可以相当复制)，并为每一行指定目标变量(分类标签). y pandas DataFrame应该具有不同的分类标签(例如'class1'，'class2'...)，每个标签应至少出现15次.

I cannot post the original data because it's proprietary, but it is fairly reproducible by creating a random pandas DataFrame (X) with 1k rows x 500 columns, and a random pandas DataFrame (y) with the same number of rows (1k) of X, and, for each row the target variable (a categorical label). The y pandas DataFrame should have different categorical labels (e.g. 'class1', 'class2'...) and each labels should have at least 15 occurrences.

scikit-learn错误:y中人口最少的类只有1个成员 [英] scikit-learn error: The least populated class in y has only 1 member

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

scikit-learn错误:y中人口最少的类只有1个成员 [英] scikit-learn error: The least populated class in y has only 1 member

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭