如何在Apache Spark(PySpark 1.4.1)中可视化/绘制决策树? [英] How do I visualise / plot a decision tree in Apache Spark (PySpark 1.4.1)?

查看:709
本文介绍了如何在Apache Spark(PySpark 1.4.1)中可视化/绘制决策树?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Apache Spark Mllib 1.4.1(PySpark,Spark的python实现)基于我拥有的LabeledPoint数据生成决策树.该树会正确生成,我可以将其打印到终端(此用户称其为

I am using Apache Spark Mllib 1.4.1 (PySpark, the python implementation of Spark) to generate a decision tree based on LabeledPoint data I have. The tree generates correctly and I can print it to the terminal (extract the rules as this user calls it How to extract rules from decision tree spark MLlib) using:

model = DecisionTree.trainClassifier( ... )
print(model.toDebugString()

但是我要做的是可视化或绘制决策树,而不是将其打印到终端.有什么方法可以在PySpark中绘制决策树,或者可以保存决策树数据并使用R进行绘制?谢谢!

But what I want to do is visualize or plot the decision tree rather than printing it to the terminal. Is there any way I can plot the decision tree in PySpark or maybe I can save the decision tree data and use R to plot it? Thanks!

推荐答案

有这个项目 Decision-Tree-Visualization-Spark 用于可视化决策树模型

There is this project Decision-Tree-Visualization-Spark for visualizing decision tree model

它有两个步骤

  • 将Spark决策树输出解析为 JSON 格式.
  • 使用JSON文件作为 D3.js 可视化的输入.
  • Parse Spark Decision Tree output to a JSON format.
  • Use the JSON file as an input to a D3.js visualization.

对于解析器,请检查 Dt .py

函数def tree_json(tree)的输入是您的模型toDebugString()

The input to the function def tree_json(tree) is your models toDebugString()

问题

这篇关于如何在Apache Spark(PySpark 1.4.1)中可视化/绘制决策树?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆