使用rpart决策树进行数据预测 [英] Data Prediction using Decision Tree of rpart
问题描述
我正在使用R对称为"d"的数据帧进行分类,其中包含如下结构的数据:
I am using R to classify a data-frame called 'd' containing data structured like below:
数据具有576666行,列"classLabel"具有3级因子:一级,二级,三级.
The data has 576666 rows and the column "classLabel" has a factor of 3 levels: ONE, TWO, THREE.
我正在使用rpart制作决策树:
I am making a decision tree using rpart:
fitTree = rpart(d$classLabel ~ d$tripduration + d$from_station_id + d$gender + d$birthday)
我想为 newdata
预测"classLabel"的值:
And I want to predict the values for the "classLabel" for newdata
:
newdata = data.frame( tripduration=c(345,244,543,311),
from_station_id=c(60,28,100,56),
gender=c("Male","Female","Male","Male"),
birthday=c(1972,1955,1964,1967) )
p <- predict(fitTree, newdata)
我希望我的结果是一个4行的矩阵,每行的概率为 newdata
的"classLabel"的三个可能值.但是我在p中得到的结果是一个576666行的数据帧,如下所示:
I expect my result to be a matrix of 4 rows each with a probability of the three possible values for "classLabel" of newdata
. But what I get as the result in p, is a dataframe of 576666 rows like below:
运行 predict
函数时,我还会收到以下警告:
I also get the following warning when running the predict
function:
Warning message:
'newdata' had 4 rows but variables found have 576666 rows
我在哪里做错了?!
推荐答案
我认为问题是:您应该在预测代码中添加"type ='class'":
I think the problem is: you should add "type='class'"in the prediction code:
predict(fitTree,newdata,type="class")
尝试以下代码.在此示例中,我采用虹膜"数据集.
Try the following code. I take "iris" dataset in this example.
> data(iris)
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
# model fitting
> fitTree<-rpart(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,iris)
#prediction-one row data
> newdata<-data.frame(Sepal.Length=7,Sepal.Width=4,Petal.Length=6,Petal.Width=2)
> newdata
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 7 4 6 2
# perform prediction
> predict(fitTree, newdata,type="class")
1
virginica
Levels: setosa versicolor virginica
#prediction-multiple-row data
> newdata2<-data.frame(Sepal.Length=c(7,8,6,5),
+ Sepal.Width=c(4,3,2,4),
+ Petal.Length=c(6,3.4,5.6,6.3),
+ Petal.Width=c(2,3,4,2.3))
> newdata2
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 7 4 6.0 2.0
2 8 3 3.4 3.0
3 6 2 5.6 4.0
4 5 4 6.3 2.3
# perform prediction
> predict(fitTree,newdata2,type="class")
1 2 3 4
virginica virginica virginica virginica
Levels: setosa versicolor virginica
这篇关于使用rpart决策树进行数据预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!