在R中使用randomForest预测/估计值 [英] Predict/estimate values using randomForest in R

查看：100 发布时间：2020/10/17 22:07:45 r data-modeling random-forest prediction

本文介绍了在R中使用randomForest预测/估计值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想根据调查区域来预测未调查区域中 Pop_avg 字段的值。我根据先前问题的建议使用randomForest。

I want to predict values for my Pop_avg field in my unsurveyed areas based on surveyed areas. I am using randomForest based on a suggestion to my earlier question.

我的调查区域：

> surveyed <- read.csv("summer_surveyed.csv", header = T)
> surveyed_1 <- surveyed[, -c(1,2,3,5,6,7,9,10,11,12,13,15)]
> head(surveyed_1, n=1)
  VEGETATION                                        Pop_avg    Acres_1
1 Acer rubrum-Vaccinium corymbosum-Amelanchier spp.       0   27.68884

我未测量的区域：

> unsurveyed <- read.csv("summer_unsurveyed.csv", header = T)
> unsurveyed_1 <- unsurveyed[, -c(2,3,5,6,7,9,10,11,12,13,15)]
> head(unsurveyed_1, n=1)
OBJECTID                                       VEGETATION  Pop_avg   Acres_1
      13 Acer rubrum-Vaccinium corymbosum-Amelanchier spp.       0  4.787381

然后我从 unsurveyed_1 中删除了行其中包含在 surveyed_1 中找不到的植被类型，并删除了未使用的特征级别。

I then removed rows from unsurveyed_1 that contained vegetation types not found in surveyed_1 and dropped the unused feature levels.

> setdiff(unsurveyed_1$VEGETATION, surveyed_1$VEGETATION) 

> unsurveyed_1 <- unsurveyed_1[!unsurveyed_1$VEGETATION == "Typha (angustifolia, latifolia) - (Schoenoplectus spp.) Eastern Herbaceous Vegetation", ]
> unsurveyed_1 <- unsurveyed_1[!unsurveyed_1$VEGETATION == "Acer rubrum- Nyssa sylvatica saturated forest alliance",]
> unsurveyed_1 <- unsurveyed_1[!unsurveyed_1$VEGETATION == "Prunus serotina",]

> unsurveyed_drop <- droplevels(unsurveyed_1)

接下来，我运行randomForest并进行预测并将输出添加到 unsurveyed_drop ：

Next I ran randomForest and predict and added the output to unsurveyed_drop:

> surveyed_pred <- randomForest(Pop_avg ~ 
+ VEGETATION+Acres_1,
+ data = surveyed_1,
+ importance = TRUE)

> summer_results <- predict(surveyed_pred, unsurveyed_drop,type="response",
+ norm.votes=TRUE, predict.all=F, proximity=FALSE, nodes=FALSE)

> summer_all <- cbind(unsurveyed_drop, summer_results)
> head(summer_all, n=1)
OBJECTID                                        VEGETATION Pop_avg   Acres_1 summer_results
      13 Acer rubrum-Vaccinium corymbosum-Amelanchier spp.       0  4.787381       0.120077

我想估算 Pop_avg列的值 summer_all 中的code>。我假设我需要使用 summer_results 中生成的比例，但是我不确定该如何做。感谢您的帮助或其他建议。

I would like to estimate values for the column Pop_avg in summer_all. I am assuming that I need to use the proportions generated in summer_results, but I'm unsure how I would do this. Thanks for any help or further suggestions.

更多信息：
我希望获得 Pop_avg 基于植被和 Acres_1 。我不确定是否/如何在输出 summer_results 中使用概率来实现这一目标，或者我是否需要更改模型或尝试其他方法。


More information: 
I am looking to get predicted count data for Pop_avg based on Vegetation and Acres_1. I am not sure if/how to use the probabalities in my output summer_results to achieve this or if I need to alter my model or try a different method. 
  E2  
我认为输出不正确的原因是因为 Pop_avg 的范围是.333以上（看到鹿的地方），这是人口除以3。而人口的范围是1或更高（即10、20 ...）。当我运行模型来尝试预测一个时，我得到的相似数字范围从.9xx到2或3.xxx，尤其是当我使用 Population 运行它时。 
E2
The reason I didn't think the output was right is because Pop_avg ranges anywhere from .333 and up (where there were deer seen) which is Population divided by 3. And Population ranges from 1 and up (i.e. 10, 20...). When I ran the model trying to predict either one I get similar numbers that range from .9xx to 2 or 3.xxx especially when I ran it with Population. Which didn't seem right.
 数据： 
 
  summer_surveyed_sample  
  summer_unsurveyed_sample  
推荐答案
我的问题在于我的训练模型。我发现我需要使用调查数据的子集，其中人口> 0以获得更准确的预测。
My problem lied within my training model. I figured out that I needed to use a subset of my surveyed data where Population > 0 to get more accurate predictions.
> surveyed_1 <- surveyed_1[c(surveyed_1$Population > 0),]
> surveyed_drop <- droplevels(surveyed_1)
> surveyed_pred <- randomForest(Population ~ 
                VEGETATION+Acres_1,
                data = surveyed_drop,
                importance = TRUE)


                        这篇关于在R中使用randomForest预测/估计值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在R中使用randomForest预测/估计值 [英] Predict/estimate values using randomForest in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中使用randomForest预测/估计值 [英] Predict/estimate values using randomForest in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭