scikit-learn-以概率为目标变量的多项式逻辑回归 [英] scikit-learn - multinomial logistic regression with probabilities as a target variable

查看：103 发布时间：2020/5/4 3:20:58 python machine-learning scikit-learn logistic-regression multinomial

本文介绍了scikit-learn-以概率为目标变量的多项式逻辑回归的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用scikit-learn在Python中实现多项式逻辑回归模型.但是，我想对目标变量的类使用概率分布.例如，假设这是一个3类变量，如下所示:

I'm implementing a multinomial logistic regression model in Python using scikit-learn. The thing is, however, that I'd like to use probability distribution for classes of my target variable. As an example let's say that this is a 3-classes variable which looks as follows:

    class_1 class_2 class_3
0   0.0     0.0     1.0
1   1.0     0.0     0.0
2   0.0     0.5     0.5
3   0.2     0.3     0.5
4   0.5     0.1     0.4

因此每行的值之和等于1.

So that a sum of values for every row equals to 1.

我该如何拟合这样的模型?当我尝试时:

How could I fit a model like this? When I try:

model = LogisticRegression(solver='saga', multi_class='multinomial')
model.fit(X, probabilities)

我收到一条错误消息:

ValueError: bad input shape (10000, 3)

我所知道的与以下事实有关:此方法需要一个向量，而不是一个矩阵.但是在这里，由于类不是互斥的，所以我不能将probabilities矩阵压缩为向量.

Which I know is related to the fact that this method expects a vector, not a matrix. But here I can't compress the probabilities matrix into vector since the classes are not exclusive.

推荐答案

在scikit-learn中，您不会因非指标概率而产生交叉熵损失； API中未实现此功能，并且不支持此功能.这是scikit学习的限制.

You can't have cross-entropy loss with non-indicator probabilities in scikit-learn; this is not implemented and not supported in API. It is a scikit-learn's limitation.

对于逻辑回归，您可以根据实例的标签概率对实例进行上采样来对其进行近似.例如，您可以对每个实例进行10倍的上采样:如果对于一个训练实例，类1的概率为0.2，而类2的概率为0.8，则生成10个训练实例:2类为8，而1类为2.这虽然效率不高，但是在一定程度上将优化相同的目标函数.

For logistic regression you can approximate it by upsampling instances according to probabilities of their labels. For example, you can up-sample every instance 10x: e.g. if for a training instance class 1 has probability 0.2, and class 2 has probability 0.8, generate 10 training instances: 8 with class 2 and 2 with class 1. It won't be as efficient as it could be, but in a limit you'll be optimizing the same objective function.

您可以执行以下操作:

from sklearn.utils import check_random_state
import numpy as np

def expand_dataset(X, y_proba, factor=10, random_state=None):
    """
    Convert a dataset with float multiclass probabilities to a dataset
    with indicator probabilities by duplicating X rows and sampling
    true labels.
    """
    rng = check_random_state(random_state)
    n_classes = y_proba.shape[1]
    classes = np.arange(n_classes, dtype=int)
    for x, probs in zip(X, y_proba):
        for label in rng.choice(classes, size=factor, p=probs):
            yield x, label

在此处查看更完整的示例: https://github.com/TeamHG-Memex/eli5/blob/8cde96878f14c8f46e10627190abd9eb9e705ed4/eli5/lime/utils.py#L16

See a more complete example here: https://github.com/TeamHG-Memex/eli5/blob/8cde96878f14c8f46e10627190abd9eb9e705ed4/eli5/lime/utils.py#L16

或者，您可以使用TensorFlow或PyTorch之类的库来实现Logistic回归；与scikit-learn不同，在这些框架中定义任何损失都很容易，而且交叉熵也可以直接使用.

Alternatively, you can implement your Logistic Regression using libraries like TensorFlow or PyTorch; unlike scikit-learn, it is easy to define any loss in these frameworks, and cross-entropy is available out of box.

这篇关于scikit-learn-以概率为目标变量的多项式逻辑回归的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

scikit-learn-以概率为目标变量的多项式逻辑回归 [英] scikit-learn - multinomial logistic regression with probabilities as a target variable

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

scikit-learn-以概率为目标变量的多项式逻辑回归 [英] scikit-learn - multinomial logistic regression with probabilities as a target variable

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭