在pySpark中使用paramGrid从CrossValidator提取结果 [英] Extract results from CrossValidator with paramGrid in pySpark

查看：386 发布时间：2020/9/4 4:04:34 python apache-spark pyspark apache-spark-ml

本文介绍了在pySpark中使用paramGrid从CrossValidator提取结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我用pySpark训练了一个随机森林.我想在结果中每个网格点都有一个csv. 我的代码是:

I train a Random Forest with pySpark. I want to have a csv with the results, per dot in the grid. My code is:

estimator = RandomForestRegressor()
evaluator = RegressionEvaluator()
paramGrid = ParamGridBuilder().addGrid(estimator.numTrees, [2,3])\
                              .addGrid(estimator.maxDepth, [2,3])\
                              .addGrid(estimator.impurity, ['variance'])\
                              .addGrid(estimator.featureSubsetStrategy, ['sqrt'])\
                              .build()
pipeline = Pipeline(stages=[estimator])

crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          evaluator=evaluator,
                          numFolds=3)

cvModel = crossval.fit(result)

所以我要一个csv:

numTrees | maxDepth | impurityMeasure 

2            2          0.001 

2            3          0.00023

等

做到这一点的最佳方法是什么?

What is the best way to do this?

推荐答案

您将不得不组合不同的数据位:

You'll have to combine different bits of data:

Estimator ParamMaps使用getEstimatorParamMaps方法提取.
可以使用avgMetrics参数检索的训练指标.

Estimator ParamMaps extracted using getEstimatorParamMaps method.
Training metrics which can be retrieved using avgMetrics parameter.

首先获取在地图中声明的所有参数的名称和值:

First get names and values of all parameters declared in the map:

params = [{p.name: v for p, v in m.items()} for m in cvModel.getEstimatorParamMaps()]

Thane zip具有指标并转换为数据框

Thane zip with metrics and convert to a data frame

import pandas as pd

pd.DataFrame.from_dict([
    {cvModel.getEvaluator().getMetricName(): metric, **ps} 
    for ps, metric in zip(params, cvModel.avgMetrics)
])

这篇关于在pySpark中使用paramGrid从CrossValidator提取结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在pySpark中使用paramGrid从CrossValidator提取结果 [英] Extract results from CrossValidator with paramGrid in pySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在pySpark中使用paramGrid从CrossValidator提取结果 [英] Extract results from CrossValidator with paramGrid in pySpark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭