如何提取决策树火花MLlib规则 [英] How to extract rules from decision tree spark MLlib
问题描述
我使用星火MLlib 1.4.1创建决策树模型。现在我想提取决策树规则。
I am using Spark MLlib 1.4.1 to create decisionTree model. Now I want to extract rules from decision tree.
我怎么能提取规则?
推荐答案
您可以通过调用model.toDebugString(获得完整的模型作为一个字符串),或致电model.save(SC,文件路径),将其保存为JSON。
You can get the full model as a string by calling model.toDebugString(), or save it as JSON by calling model.save(sc, filePath).
的文档是在这里,其中包含与您可以检查在命令行输出格式的小样本数据的例子。在这里,我格式化脚本,您可以直接过去和运行。
The documentation is here, which contains a example with a small sample data that you can inspect the output format in command line. Here I formatted the script that you can directly past and run.
from numpy import array
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree
data = [
LabeledPoint(0.0, [0.0]),
LabeledPoint(1.0, [1.0]),
LabeledPoint(1.0, [2.0]),
LabeledPoint(1.0, [3.0])
]
model = DecisionTree.trainClassifier(sc.parallelize(data), 2, {})
print(model)
print(model.toDebugString())
的输出是:
DecisionTreeModel classifier of depth 1 with 3 nodes
DecisionTreeModel classifier of depth 1 with 3 nodes
If (feature 0 <= 0.0)
Predict: 0.0
Else (feature 0 > 0.0)
Predict: 1.0
在实际应用中,该模型可以非常大,包括许多线路。因此,直接使用dtModel.toDebugString()可能会导致笔记本IPython的停止。所以,我建议出来把它作为一个文本文件。
In real application, the model can be very large and consists many lines. So directly use dtModel.toDebugString() can cause IPython notebook to halt. So I suggest to out put it as a text file.
下面是如何导出模型dtModel到文本文件中的示例code。假设我们得到dtModel是这样的:
Here is an example code of how to export a model dtModel to text file. Suppose we get the dtModel like this:
dtModel = DecisionTree.trainClassifier(parsedTrainData, numClasses=7, categoricalFeaturesInfo={},impurity='gini', maxDepth=20, maxBins=24)
modelFile = ~/decisionTreeModel.txt"
f = open(modelFile,"w")
f.write(dtModel.toDebugString())
f.close()
下面是上面的脚本从我dtMmodel输出例如:
Here is an example output of the above script from my dtMmodel:
DecisionTreeModel classifier of depth 20 with 20031 nodes
If (feature 0 <= -35.0)
If (feature 24 <= 176.0)
If (feature 0 <= -200.0)
If (feature 29 <= 109.0)
If (feature 6 <= -156.0)
If (feature 9 <= 0.0)
If (feature 20 <= -116.0)
If (feature 16 <= 203.0)
If (feature 11 <= 163.0)
If (feature 5 <= 384.0)
If (feature 15 <= 325.0)
If (feature 13 <= -248.0)
If (feature 20 <= -146.0)
Predict: 0.0
Else (feature 20 > -146.0)
If (feature 19 <= -58.0)
Predict: 6.0
Else (feature 19 > -58.0)
Predict: 0.0
Else (feature 13 > -248.0)
If (feature 9 <= -26.0)
Predict: 0.0
Else (feature 9 > -26.0)
If (feature 10 <= 218.0)
...
...
...
...
这篇关于如何提取决策树火花MLlib规则的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!