PySpark 中是否有与 scikit-learn 的 sample_weight 等效的参数? [英] Is there in PySpark a parameter equivalent to scikit-learn's sample_weight?

查看:26
本文介绍了PySpark 中是否有与 scikit-learn 的 sample_weight 等效的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用 scikit-learn 库提供的 SGDClassifier.当我使用 fit 方法时,我可以设置 sample_weight 参数:

I am currently using the SGDClassifier provided by the scikit-learn library. When I use the fit method I can set the sample_weight parameter:

应用于单个样本的权重.如果没有提供,统一假设权重.这些权重将乘以class_weight(通过构造函数传递)如果 class_weight 是指定

Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified

我想切换到 PySpark 并使用 LogisticRegression 类.无论如何,我找不到类似于 sample_weight 的参数.有一个 weightCol 参数,但我认为它做了一些不同的事情.

I want to switch to PySpark and to use the LogisticRegression class. Anyway I cannot find a parameter similar to sample_weight. There is a weightCol parameter but I think it does something different.

你有什么建议吗?

推荐答案

有一个 weightCol 参数,但我认为它做了一些不同的事情.

There is a weightCol parameter but I think it does something different.

相反,Spark ML 的 weightCol 正是这样做的;来自 docs(强调):

On the contrary, weightCol of Spark ML does exactly that; from the docs (emphasis added):

weightCol = Param(parent='undefined', name='weightCol', doc='weight 列名.如果未设置或为空,我们将处理所有 实例权重为 1.0.')

weightCol = Param(parent='undefined', name='weightCol', doc='weight column name. If this is not set or empty, we treat all instance weights as 1.0.')

这篇关于PySpark 中是否有与 scikit-learn 的 sample_weight 等效的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆