给定真值表需要创建二叉树的帮助 [英] Help Needed Creating a Binary Tree Given Truth Table

查看:62
本文介绍了给定真值表需要创建二叉树的帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,为了提供完整的公开信息,我想指出的是,这与机器学习课程中的作业有关.这个问题不是作业问题,而是我需要解决的问题,以解决创建ID3决策树算法这一更大的问题.

First, in order to provide full disclosure, I want to point out that this is related to homework in a Machine Learning class. This question is not the homework question and instead is something I need to figure out in order to complete the bigger problem of creating an ID3 Decision Tree Algorithm.

在提供真值表时,我需要生成类似于以下内容的树

I need to generate tree similar to the following when given a truth table

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))

learnedTree的类型为BinaryTree,其定义如下:

learnedTree is of type BinaryTree which I've defined as follows:

type BinaryTree =
    | Leaf of int
    | Node of int * string * BinaryTree * BinaryTree

ID3算法考虑了各种方程式来确定在哪里拆分树,而我已经弄清了所有内容,只是在从真值表创建学习到的树时遇到了麻烦.例如,如果我有下表

ID3 algorithms take into account various equations to determine where to split the tree, and I've got all that figured out, I'm just having trouble creating the learned tree from my truth table. For example if I have the following table

A1 | A2 | A3 | Class
1     0    0      1
0     1    0      1
0     0    0      0
1     0    1      0
0     0    0      0
1     1    0      1
0     1    1      0

我决定拆分属性A1,最后得到以下结果:

And I decide to split on attribute A1 I would end up with the following:

              (A1 = 1)  A1   (A1 = 0)
   A2 | A3 | Class                A2 | A3 | Class
   0     0      1                1      0      1
   0     1      0                0      0      0
   1     0      1                0      0      0
                                 0      1      1

然后,我将拆分左侧并拆分右侧,并继续递归模式,直到叶节点是纯净的,然后基于拆分,我得到一棵类似于以下内容的树.

Then I would split the left side and split the right side, and continue the recursive pattern until the leaf nodes are pure and I end up with a tree similar to the following based on the splitting.

let learnedTree = Node(0,"A0", Node(2,"A2", Leaf(0), Leaf(1)), Node(1,"A1", Node(2,"A2", Leaf(0), Leaf(1)), Leaf(0)))

到目前为止,这是我迄今为止被黑客入侵"的一种方式,但我认为我可能会遥遥无期:

Here is what I've kind of "hacked" together thus far, but I think I might be way off:

let rec createTree (listToSplit : list<list<float>>) index =
    let leftSideSplit =
        listToSplit |> List.choose (fun x -> if x.Item(index) = 1. then Some(x) else None)
    let rightSideSplit =
        listToSplit |> List.choose (fun x -> if x.Item(index) = 0. then Some(x) else None)
    if leftSideSplit.Length > 0 then
        let pureCheck = isListPure leftSideSplit
        if pureCheck = 0 then
            printfn "%s" "Pure left node class 0"
            createTree leftSideSplit (index + 1)
        else if pureCheck = 1 then
            printfn "%s" "Pure left node class 1"
            createTree leftSideSplit (index + 1)
        else
            printfn "%s - %A" "Recursing Left" leftSideSplit
            createTree leftSideSplit (index + 1)
    else printfn "%s" "Pure left node class 0"

我应该改用模式匹配吗?有任何提示/想法/帮助吗?谢谢一堆!

Should I be using pattern matching instead? Any tips/ideas/help? Thanks a bunch!

推荐答案

编辑:此后,我在我的博客上发布了ID3的实现: http://blogs.msdn.com/chrsmith

Edit: I've since posted an implementation of ID3 on my blog at: http://blogs.msdn.com/chrsmith

嘿,吉姆,我一直想写一篇博客文章,用F#实现ID3-感谢您给我执行程序.尽管此代码并未完全(或正确)地实现算法,但对于您入门来说应该足够了.

Hey Jim, I've been wanting to write a blog post implementing ID3 in F# for a while - thanks for giving me an execute. While this code doesn't implement the algorithm full (or correctly), it should be sufficient for getting you started.

通常,您采用正确的方法-将每个分支表示为有区别的联合案例是好的.就像Brian所说的那样,List.partition绝对是一个方便的函数.正确执行此操作的诀窍在于确定要分割的最佳属性/值对,然后执行此操作需要通过熵等来计算信息增益.

In general you have the right approach - representing each branch as a discriminated union case is good. And like Brian said, List.partition is definitely a handy function. The trick to making this work correctly is all in determining the optimal attribute/value pair to split on - and to do that you'll need to calculate information gain via entropy, etc.

type Attribute = string
type Value = string

type Record = 
    {
        Weather : string
        Temperature : string
        PlayTennis : bool 
    }
    override this.ToString() =
        sprintf
            "{Weather = %s, Temp = %s, PlayTennis = %b}" 
            this.Weather 
            this.Temperature 
            this.PlayTennis

type Decision = Attribute * Value

type DecisionTreeNode =
    | Branch of Decision * DecisionTreeNode * DecisionTreeNode
    | Leaf of Record list

// ------------------------------------

// Splits a record list into an optimal split and the left / right branches.
// (This is where you use the entropy function to maxamize information gain.)
// Record list -> Decision * Record list * Record list
let bestSplit data = 
    // Just group by weather, then by temperature
    let uniqueWeathers = 
        List.fold 
            (fun acc item -> Set.add item.Weather acc) 
            Set.empty
            data

    let uniqueTemperatures = 
        List.fold
            (fun acc item -> Set.add item.Temperature acc) 
            Set.empty
            data

    if uniqueWeathers.Count = 1 then
        let bestSplit = ("Temperature", uniqueTemperatures.MinimumElement)
        let left, right = 
            List.partition
                (fun item -> item.Temperature = uniqueTemperatures.MinimumElement) 
                data
        (bestSplit, left, right)
    else
        let bestSplit = ("Weather", uniqueWeathers.MinimumElement)
        let left, right =
            List.partition
                (fun item -> item.Weather = uniqueWeathers.MinimumElement)
                data
        (bestSplit, left, right)

let rec determineBranch data =
    if List.length data < 4 then
        Leaf(data)
    else
        // Use the entropy function to break the dataset on
        // the category / value that best splits the data
        let bestDecision, leftBranch, rightBranch = bestSplit data
        Branch(
            bestDecision, 
            determineBranch leftBranch, 
            determineBranch rightBranch)

// ------------------------------------    

let rec printID3Result indent branch =
    let padding = new System.String(' ', indent)
    match branch with
    | Leaf(data) ->
        data |> List.iter (fun item -> printfn "%s%s" padding <| item.ToString())
    | Branch(decision, lhs, rhs) ->
        printfn "%sBranch predicate [%A]" padding decision
        printfn "%sWhere predicate is true:" padding
        printID3Result (indent + 4) lhs
        printfn "%sWhere predicate is false:" padding
        printID3Result (indent + 4) rhs


// ------------------------------------    

let dataset =
    [
        { Weather = "windy"; Temperature = "hot";  PlayTennis = false }
        { Weather = "windy"; Temperature = "cool"; PlayTennis = false }
        { Weather = "nice";  Temperature = "cool"; PlayTennis = true }
        { Weather = "nice";  Temperature = "cold"; PlayTennis = true }
        { Weather = "humid"; Temperature = "hot";  PlayTennis = false }
    ]

printfn "Given input list:"
dataset |> List.iter (printfn "%A")

printfn "ID3 split resulted in:"
let id3Result = determineBranch dataset
printID3Result 0 id3Result

这篇关于给定真值表需要创建二叉树的帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆