使用rpart决策树进行数据预测 [英] Data Prediction using Decision Tree of rpart

查看:106
本文介绍了使用rpart决策树进行数据预测的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R对称为"d"的数据帧进行分类,其中包含如下结构的数据:

I am using R to classify a data-frame called 'd' containing data structured like below:

数据具有576666行,列"classLabel"具有3级因子:一级,二级,三级.

The data has 576666 rows and the column "classLabel" has a factor of 3 levels: ONE, TWO, THREE.

我正在使用rpart制作决策树:

I am making a decision tree using rpart:

fitTree = rpart(d$classLabel ~ d$tripduration + d$from_station_id + d$gender +  d$birthday)

我想为 newdata 预测"classLabel"的值:

And I want to predict the values for the "classLabel" for newdata:

newdata = data.frame( tripduration=c(345,244,543,311), 
                      from_station_id=c(60,28,100,56),
                      gender=c("Male","Female","Male","Male"),  
                      birthday=c(1972,1955,1964,1967) )

 p <- predict(fitTree, newdata)

我希望我的结果是一个4行的矩阵,每行的概率为 newdata 的"classLabel"的三个可能值.但是我在p中得到的结果是一个576666行的数据帧,如下所示:

I expect my result to be a matrix of 4 rows each with a probability of the three possible values for "classLabel" of newdata. But what I get as the result in p, is a dataframe of 576666 rows like below:

运行 predict 函数时,我还会收到以下警告:

I also get the following warning when running the predict function:

Warning message:
'newdata' had 4 rows but variables found have 576666 rows 

我在哪里做错了?!

推荐答案

我认为问题是:您应该在预测代码中添加"type ='class'":

I think the problem is: you should add "type='class'"in the prediction code:

    predict(fitTree,newdata,type="class")

尝试以下代码.在此示例中,我采用虹膜"数据集.

Try the following code. I take "iris" dataset in this example.

    > data(iris)
    > head(iris)
    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
  1          5.1         3.5          1.4         0.2  setosa
  2          4.9         3.0          1.4         0.2  setosa
  3          4.7         3.2          1.3         0.2  setosa
  4          4.6         3.1          1.5         0.2  setosa
  5          5.0         3.6          1.4         0.2  setosa
  6          5.4         3.9          1.7         0.4  setosa

  # model fitting
  > fitTree<-rpart(Species~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,iris)

  #prediction-one row data
  > newdata<-data.frame(Sepal.Length=7,Sepal.Width=4,Petal.Length=6,Petal.Width=2)
  > newdata
  Sepal.Length Sepal.Width Petal.Length Petal.Width
  1            7           4            6           2

 # perform prediction
  > predict(fitTree, newdata,type="class")
     1 
  virginica 
  Levels: setosa versicolor virginica

 #prediction-multiple-row data
 > newdata2<-data.frame(Sepal.Length=c(7,8,6,5),
 +                      Sepal.Width=c(4,3,2,4),
 +                      Petal.Length=c(6,3.4,5.6,6.3),
 +                      Petal.Width=c(2,3,4,2.3))

 > newdata2
  Sepal.Length Sepal.Width Petal.Length Petal.Width
   1            7           4          6.0         2.0
   2            8           3          3.4         3.0
   3            6           2          5.6         4.0
   4            5           4          6.3         2.3

# perform prediction
> predict(fitTree,newdata2,type="class")
      1         2         3         4 
 virginica virginica virginica virginica 
 Levels: setosa versicolor virginica

这篇关于使用rpart决策树进行数据预测的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆