如何在PySpark管道中使用XGboost [英] How to use XGboost in PySpark Pipeline

查看:590
本文介绍了如何在PySpark管道中使用XGboost的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想更新我的pyspark代码.在pyspark中,必须将基本模型放入管道中,办公室演示管道使用LogistictRegression作为基本模型.但是,似乎无法在管道API中使用XGboost模型.我该如何使用pyspark

I want to update my code of pyspark. In the pyspark, it must put the base model in a pipeline, the office demo of pipeline use the LogistictRegression as an base model. However, it seems not be able to use XGboost model in the pipeline api. How can I use the pyspark like this

from xgboost import XGBClassifier
...
model = XGBClassifier()
model.fit(X_train, y_train)
pipeline = Pipeline(stages=[..., model, ...])
...

使用管道api很方便,因此有人可以提出建议吗?谢谢.

It is convenient to use the pipeline api, so can anybody give some advices? Thanks.

推荐答案

此处有适用于Spark 2.4的XBoost实现:

There is an XBoost Implementation for Spark 2.4 and over here:

https://xgboost.readthedocs.io

请注意,这是一个外部库,但可以轻松地与spark配合使用.

Note that this is an external library but it should work easily with spark.

这篇关于如何在PySpark管道中使用XGboost的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆