R:递归评估存储为列表的二叉树 [英] R: recursively evaluating a binary tree stored as a list

查看:63
本文介绍了R:递归评估存储为列表的二叉树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一棵名为 mytree 的树,看起来像这样:

I have a tree called mytree that looks like this:

我将其存储为列表:

mytree <- list(list(structure(list(y = c(-10, 7, 8, -7), x = c(10, 20, 
25, 35), grad = c(-10.5, 6.5, 7.5, -7.5), sim_score = c(4, 4, 
4, 4), value = c(-1, -1, -1, -1)), row.names = c(NA, -4L), class = "data.frame")), 
    list(structure(list(y = -10, x = 10, grad = -10.5, sim_score = 110.25, 
        value = -10.5, gain = 120.333333333333, criterion = "x < 15"), row.names = 1L, class = "data.frame"), 
        structure(list(y = c(7, 8, -7), x = c(20, 25, 35), grad = c(6.5, 
        7.5, -7.5), sim_score = c(14.0833333333333, 14.0833333333333, 
        14.0833333333333), value = c(2.16666666666667, 2.16666666666667, 
        2.16666666666667), gain = c(120.333333333333, 120.333333333333, 
        120.333333333333), criterion = c("x >= 15", "x >= 15", 
        "x >= 15")), row.names = 2:4, class = "data.frame")), 
    list(NULL, NULL, structure(list(y = c(7, 8), x = c(20, 25
    ), grad = c(6.5, 7.5), sim_score = c(98, 98), value = c(7, 
    7), gain = c(140.166666666667, 140.166666666667), criterion = c("x < 30", 
    "x < 30")), row.names = 2:3, class = "data.frame"), structure(list(
        y = -7, x = 35, grad = -7.5, sim_score = 56.25, value = -7.5, 
        gain = 140.166666666667, criterion = "x >= 30"), row.names = 4L, class = "data.frame")), 
    list(NULL, NULL, NULL, NULL, structure(list(y = 7, x = 20, 
        grad = 6.5, sim_score = 42.25, value = 6.5, gain = 0.5, 
        criterion = "x < 22.5"), row.names = 2L, class = "data.frame"), 
        structure(list(y = 8, x = 25, grad = 7.5, sim_score = 56.25, 
            value = 7.5, gain = 0.5, criterion = "x >= 22.5"), row.names = 3L, class = "data.frame"), 
        NULL, NULL))

,它看起来像这样:

[[1]]
[[1]][[1]]
    y  x  grad sim_score value
1 -10 10 -10.5         4    -1
2   7 20   6.5         4    -1
3   8 25   7.5         4    -1
4  -7 35  -7.5         4    -1


[[2]]
[[2]][[1]]
    y  x  grad sim_score value     gain criterion
1 -10 10 -10.5    110.25 -10.5 120.3333    x < 15

[[2]][[2]]
   y  x grad sim_score    value     gain criterion
2  7 20  6.5  14.08333 2.166667 120.3333   x >= 15
3  8 25  7.5  14.08333 2.166667 120.3333   x >= 15
4 -7 35 -7.5  14.08333 2.166667 120.3333   x >= 15


[[3]]
[[3]][[1]]
NULL

[[3]][[2]]
NULL

[[3]][[3]]
  y  x grad sim_score value     gain criterion
2 7 20  6.5        98     7 140.1667    x < 30
3 8 25  7.5        98     7 140.1667    x < 30

[[3]][[4]]
   y  x grad sim_score value     gain criterion
4 -7 35 -7.5     56.25  -7.5 140.1667   x >= 30


[[4]]
[[4]][[1]]
NULL

[[4]][[2]]
NULL

[[4]][[3]]
NULL

[[4]][[4]]
NULL

[[4]][[5]]
  y  x grad sim_score value gain criterion
2 7 20  6.5     42.25   6.5  0.5  x < 22.5

[[4]][[6]]
  y  x grad sim_score value gain criterion
3 8 25  7.5     56.25   7.5  0.5 x >= 22.5

[[4]][[7]]
NULL

[[4]][[8]]
NULL

列表的第一个索引,即1、2、3、4,对应于树的级别或高度。第二个索引对应于给定级别中节点的索引。例如, mytree [[1]] [[1]] 包含根,该根在 mytree [[2]] [[ 1]] mytree [[2]] [[2]]

The first index of the list, i.e. 1, 2, 3, 4, correspond to the level, or height of the tree. The second index corresponds to the index of the node in the given level. For example, mytree[[1]][[1]] contains the root, which has child nodes in mytree[[2]][[1]] and mytree[[2]][[2]].

给定存储在 mytree [[i]] [[j]] 中的父节点,其子节点将被存储在 mytree [[i + 1]] [[2 * j]] mytree [[i + 1]] [[2 * j- 1]]

Given a parent node stored in mytree[[i]][[j]], its children are stored in mytree[[i + 1]][[2 * j]] and mytree[[i + 1]][[2 * j -1]].

我想编写一个名为 eval_tree 的函数,该函数在给定新实例 x ,它将通过检查拆分的条件来检查哪个叶节点 x 属于然后输出叶子的值,该值存储在 value 下。这是我希望 eval_tree 工作的示例:

I want to write a function called eval_tree that when given a new instance x, it will check which leaf node x falls into by checking the criterion of the splits and then output the value of the leaf, which is stored under value. Here is an example of how I'd like eval_tree to work:

newdata <- data.frame(x = c(10, 20, 25, 35))
> eval_tree(tree = mytree, newdata = newdata)
[1] -10.5
[2] 6.5
[3] 7.5
[4] -7.5

这是我到目前为止的内容。不幸的是,它无法正常工作。我想我可能需要递归地实现该功能,以使其效率更高。谁能指出我的正确方向?

Here is what I have so far. Unfortunately it's not working...and I think I may need to implement the function recursively so that it's more efficient. Can anyone point me in the right direction?

eval_tree <- function(tree, newdata){
  if(length(tree) == 1){
    # If tree only has a root, return value of root
    return(tree[[1]][[1]]$value[1])
  }else if(length(tree) > 1){
    for (level in 2:length(tree)){
      for(ind in 1:length(tree[[level]]))
        if(eval(parse(text = tree[[level]][[ind]][["criterion"]]))){
          # Criterion is true, then go to child node
          # Check if there is child node
          if(is.null(tree[[level + 1]][[ind * 2]]) && is.null(tree[[level + 1]][[ind * 2 - 1]])){
            return(tree[[level]][[ind]]$value[1])
          }else if(eval(parse(text = tree[[level + 1]][[ind * 2]][["criterion"]]))){
            # Criterion is true, then go to childi node
            # I think this is where recursion would be more appropriate than all these nested loops
          }

        }
    }
  }
}


推荐答案

您可以尝试以下操作:

index <- function(x,tree,e, i = 1, j = 1)
{
  if(nrow((tree[[i]][[j]])) == 1)
  {
    if(eval(parse(text=tree[[i]][[j]]$crite), list(x = x))) {
      if(is.null(e$a)){
        e$a <- i
        e$b <- tree[[i]][[j]]$val
      }
      else if(e$a > i)e$b <- tree[[i]][[j]]$val
      TRUE
    }
    else FALSE
  }
  else index(x, tree, e,i + 1,2*j-1) | index(x, tree,e,i+1, 2*j)
}

pred_tree <- function( tree,newdata){

 cbind(newdata,pred = sapply(newdata$x,function(x){
   e <- new.env()
   index(x,tree,e)
    e$b
   }))
}

pred_tree(mytree,data.frame(x = c(10,20,25,30,25)))
   x  pred
1 10 -10.5
2 20   6.5
3 25   7.5
4 30  -7.5
5 25   7.5

这篇关于R:递归评估存储为列表的二叉树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆