使用rpart在回归树中搜索相应的节点 [英] Search for corresponding node in a regression tree using rpart

查看:140
本文介绍了使用rpart在回归树中搜索相应的节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,并且遇到了一个非常愚蠢的问题。

I'm pretty new to R and I'm stuck with a pretty dumb problem.

我正在使用 rpart 软件包,以便进行一些分类和预测。

I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting.

感谢R的校准,该部分易于操作且易于控制。

Thanks to R the calibration part is easy to do and easy to control.

#the package rpart is needed
library(rpart)

# Loading of a big data file used for calibration
my_data <- read.csv("my_file.csv", sep=",", header=TRUE)

# Regression tree calibration
tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + 
                      Attribute4 + Attribute5, 
                      method="anova", data=my_data, 
                      control=rpart.control(minsplit=100, cp=0.0001))

在校准了一个大决策树之后,我想要给定的数据样本找到对应的一些新数据(以及预测值)。

预测函数似乎非常适合需要。

After having calibrated a big decision tree, I want, for a given data sample to find the corresponding cluster of some new data (and thus the forecasted value).
The predict function seems to be perfect for the need.

# read validation data
validationData <-read.csv("my_sample.csv", sep=",", header=TRUE)

# search for the probability in the tree
predict <- predict(tree, newdata=validationData, class="prob")

# dump them in a file
write.table(predict, file="dump.txt") 

但是使用 predict 方法我只是获得了新元素的预测比例,而找不到找到新元素所属的决策树叶子的方法。

However with the predict method I just get the forecasted ratio of my new elements, and I can't find a way get the decision tree leaf where my new elements belong.

我认为应该很容易获得,因为预测方法必须已经找到了叶子才能返回比率。

I think it should be pretty easy to get since the predict method must have found that leaf in order to return the ratio.

有有几个参数可以通过 class = 参数提供给predict方法,但是对于回归树,所有参数似乎都返回相同的值(目标属性的值

There are several parameters that can be given to the predict method through the class= argument, but for a regression tree all seem to return the same thing (the value of the target attribute of the decision tree)

有人知道如何在决策树中获取相应的节点吗?

Does anyone know how to get the corresponding node in the decision tree?

通过使用 path.rpart 方法分析节点,可以帮助我理解结果。

By analyzing the node with the path.rpart method, it would help me understanding the results.

推荐答案

不幸的是,本杰明的答案不起作用: type = vector 仍然返回预测值。

Benjamin's answer unfortunately doesn't work: type="vector" still returns the predicted values.

我的解决方案非常笨拙,但我认为没有更好的方法。技巧是用相应的节点号替换模型框架中的预测y值。

My solution is pretty klugy, but I don't think there's a better way. The trick is to replace the predicted y values in the model frame with the corresponding node numbers.

tree2 = tree
tree2$frame$yval = as.numeric(rownames(tree2$frame))
predict = predict(tree2, newdata=validationData)

现在,预测的输出将是节点编号,而不是预测的y值。

Now the output of predict will be node numbers as opposed to predicted y values.

(注意:以上内容适用于 tree 是回归树而不是分类树的情况。如果是分类树,则可能需要省略 as.numeric 或将其替换为 as.factor 。)

(One note: the above worked in my case where tree was a regression tree, not a classification tree. In the case of a classification tree, you probably need to omit as.numeric or replace it with as.factor.)

这篇关于使用rpart在回归树中搜索相应的节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆