如何从决策树中提取规则 spark MLlib [英] How to extract rules from decision tree spark MLlib

查看:35
本文介绍了如何从决策树中提取规则 spark MLlib的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Spark MLlib 1.4.1 创建决策树模型.现在我想从决策树中提取规则.

I am using Spark MLlib 1.4.1 to create decisionTree model. Now I want to extract rules from decision tree.

如何提取规则?

推荐答案

您可以通过调用 model.toDebugString() 以字符串形式获取完整模型,或者通过调用 model.save(sc, filePath) 将其保存为 JSON.

You can get the full model as a string by calling model.toDebugString(), or save it as JSON by calling model.save(sc, filePath).

文档在这里,其中包含一个带有小样本数据的示例,您可以在命令行中检查输出格式.在这里,我格式化了您可以直接粘贴并运行的脚本.

The documentation is here, which contains a example with a small sample data that you can inspect the output format in command line. Here I formatted the script that you can directly past and run.

from numpy import array
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree

data = [
LabeledPoint(0.0, [0.0]),
LabeledPoint(1.0, [1.0]),
LabeledPoint(1.0, [2.0]),
LabeledPoint(1.0, [3.0])
]

model = DecisionTree.trainClassifier(sc.parallelize(data), 2, {})
print(model)

print(model.toDebugString())

输出为:

DecisionTreeModel classifier of depth 1 with 3 nodes
DecisionTreeModel classifier of depth 1 with 3 nodes
  If (feature 0 <= 0.0)
   Predict: 0.0
  Else (feature 0 > 0.0)
   Predict: 1.0 

在实际应用中,模型可以非常大并且包含很多行.所以直接使用 dtModel.toDebugString() 会导致 IPython notebook 停止.所以我建议把它作为一个文本文件.

In real application, the model can be very large and consists many lines. So directly use dtModel.toDebugString() can cause IPython notebook to halt. So I suggest to out put it as a text file.

这里是如何将模型 dtModel 导出到文本文件的示例代码.假设我们得到这样的 dtModel:

Here is an example code of how to export a model dtModel to text file. Suppose we get the dtModel like this:

dtModel = DecisionTree.trainClassifier(parsedTrainData, numClasses=7, categoricalFeaturesInfo={},impurity='gini', maxDepth=20, maxBins=24)



modelFile = ~/decisionTreeModel.txt"
f = open(modelFile,"w") 
f.write(dtModel.toDebugString())
f.close() 

以下是我的 dtMmodel 中上述脚本的示例输出:

Here is an example output of the above script from my dtMmodel:

DecisionTreeModel classifier of depth 20 with 20031 nodes
  If (feature 0 <= -35.0)
   If (feature 24 <= 176.0)
    If (feature 0 <= -200.0)
     If (feature 29 <= 109.0)
      If (feature 6 <= -156.0)
       If (feature 9 <= 0.0)
        If (feature 20 <= -116.0)
         If (feature 16 <= 203.0)
          If (feature 11 <= 163.0)
           If (feature 5 <= 384.0)
            If (feature 15 <= 325.0)
             If (feature 13 <= -248.0)
              If (feature 20 <= -146.0)
               Predict: 0.0
              Else (feature 20 > -146.0)
               If (feature 19 <= -58.0)
                Predict: 6.0
               Else (feature 19 > -58.0)
                Predict: 0.0
             Else (feature 13 > -248.0)
              If (feature 9 <= -26.0)
               Predict: 0.0
              Else (feature 9 > -26.0)
               If (feature 10 <= 218.0)
...
...
...
...

这篇关于如何从决策树中提取规则 spark MLlib的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆