PySpark中是否有一个等效于scikit-learn的sample_weight的参数? [英] Is there in PySpark a parameter equivalent to scikit-learn's sample_weight?

查看:167
本文介绍了PySpark中是否有一个等效于scikit-learn的sample_weight的参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前正在使用scikit-learn库提供的SGDClassifier.当我使用fit方法时,可以设置sample_weight参数:

I am currently using the SGDClassifier provided by the scikit-learn library. When I use the fit method I can set the sample_weight parameter:

权重应用于各个样本.如果未提供,请统一 假设重量.这些权重将乘以 如果class_weight为 指定

Weights applied to individual samples. If not provided, uniform weights are assumed. These weights will be multiplied with class_weight (passed through the constructor) if class_weight is specified

我想切换到PySpark并使用LogisticRegression类.无论如何,我找不到类似于sample_weight的参数.有一个weightCol参数,但我认为它有一些不同之处.

I want to switch to PySpark and to use the LogisticRegression class. Anyway I cannot find a parameter similar to sample_weight. There is a weightCol parameter but I think it does something different.

您有什么建议吗?

推荐答案

有一个weightCol参数,但我认为它有一些不同.

There is a weightCol parameter but I think it does something different.

相反,Spark ML的weightCol确实做到了.来自 docs (添加了重点):

On the contrary, weightCol of Spark ML does exactly that; from the docs (emphasis added):

weightCol = Param(parent ='undefined',name ='weightCol',doc ='weight column name.如果未设置或为空,我们将处理所有实例权重为1.0.')

weightCol = Param(parent='undefined', name='weightCol', doc='weight column name. If this is not set or empty, we treat all instance weights as 1.0.')

这篇关于PySpark中是否有一个等效于scikit-learn的sample_weight的参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆