scikit-learn 错误:y 中人口最少的类只有 1 个成员 [英] scikit-learn error: The least populated class in y has only 1 member

查看：54 发布时间：2021/12/25 14:40:30 python scikit-learn train-test-split

本文介绍了scikit-learn 错误:y 中人口最少的类只有 1 个成员的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 train_test_split 来自 scikit-learn 的函数，但我收到此错误:

I'm trying to split my dataset into a training and a test set by using the train_test_split function from scikit-learn, but I'm getting this error:

In [1]: y.iloc[:,0].value_counts()
Out[1]: 
M2    38
M1    35
M4    29
M5    15
M0    15
M3    15

In [2]: xtrain, xtest, ytrain, ytest = train_test_split(X, y, test_size=1/3, random_state=85, stratify=y)
Out[2]: 
Traceback (most recent call last):
  File "run_ok.py", line 48, in <module>
    xtrain,xtest,ytrain,ytest = train_test_split(X,y,test_size=1/3,random_state=85,stratify=y)
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1700, in train_test_split
    train, test = next(cv.split(X=arrays[0], y=stratify))
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 953, in split
    for train, test in self._iter_indices(X, y, groups):
  File "/home/aurora/.pyenv/versions/3.6.0/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 1259, in _iter_indices
    raise ValueError("The least populated class in y has only 1"
ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

然而，所有类都至少有 15 个样本.为什么我收到这个错误?

However, all classes have at least 15 samples. Why am I getting this error?

X 是一个表示数据点的 Pandas DataFrame，y 是一个 Pandas DataFrame，其中一列包含目标变量.

X is a pandas DataFrame which represents the data points, y is a pandas DataFrame with one column that contains the target variable.

我无法发布原始数据，因为它是专有的，但是通过创建一个具有 1k 行 x 500 列的随机 Pandas DataFrame (X) 和一个具有相同行数 (1k) 的 X，以及每一行的目标变量(分类标签).y pandas DataFrame 应具有不同的分类标签(例如class1"、class2"...)，并且每个标签应至少出现 15 次.

I cannot post the original data because it's proprietary, but it is fairly reproducible by creating a random pandas DataFrame (X) with 1k rows x 500 columns, and a random pandas DataFrame (y) with the same number of rows (1k) of X, and, for each row the target variable (a categorical label). The y pandas DataFrame should have different categorical labels (e.g. 'class1', 'class2'...) and each labels should have at least 15 occurrences.

scikit-learn 错误:y 中人口最少的类只有 1 个成员 [英] scikit-learn error: The least populated class in y has only 1 member

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

scikit-learn 错误:y 中人口最少的类只有 1 个成员 [英] scikit-learn error: The least populated class in y has only 1 member

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭