在给定RDD的情况下如何训练SparkML梯度提升分类器 [英] How to train SparkML gradient boosting classifer given a RDD

查看：107 发布时间：2021/4/8 20:22:27 apache-spark pyspark apache-spark-ml

本文介绍了在给定RDD的情况下如何训练SparkML梯度提升分类器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

给出以下rdd

training_rdd = rdd.select(
    # Categorical features
    col('device_os'), # 'ios', 'android'

    # Numeric features
    col('30day_click_count'), 
    col('30day_impression_count'),
    np.true_divide(col('30day_click_count'), col('30day_impression_count')).alias('30day_click_through_rate'),

    # label
    col('did_click').alias('label')
)

我对训练梯度增强分类器的语法感到困惑.

I am confused about the syntax to train a gradient boosting classifer.

我正在关注本教程. https://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier

但是，我不确定如何将4个要素列放入向量中.因为VectorIndexer假定所有功能都已经在同一列中.

However, I am unsure about how to get my 4 feature columns into a vector. Because VectorIndexer assumes that all the features are already in one column.

推荐答案

您可以使用 VectorAssembler 生成特征向量.请注意，您必须先将 rdd 转换为 DataFrame .

You can use VectorAssembler to generate the feature vector. Please note that you will have to convert your rdd to a DataFrame first.

from pyspark.ml.feature import VectorAssembler
vectorizer = VectorAssembler()

vectorizer.setInputCols(["device_os",
                         "30day_click_count",
                         "30day_impression_count",
                         "30day_click_through_rate"])

vectorizer.setOutputCol("features")

因此，您需要将 vectorizer 作为第一阶段放入 Pipeline :

And consequently, you will need to put vectorizer as the first stage into the Pipeline:

pipeline = Pipeline([vectorizer, ...])

这篇关于在给定RDD的情况下如何训练SparkML梯度提升分类器的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在给定RDD的情况下如何训练SparkML梯度提升分类器 [英] How to train SparkML gradient boosting classifer given a RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在给定RDD的情况下如何训练SparkML梯度提升分类器 [英] How to train SparkML gradient boosting classifer given a RDD

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭