将两个回归预测模型(具有数据帧的子集)合并回数据帧(一列) [英] Merge two regression prediction models (with subsets of a data frame) back into the data frame (one column)

查看:169
本文介绍了将两个回归预测模型(具有数据帧的子集)合并回数据帧(一列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一年前,我正在建立一个类似的问题,并在SO上面回答。
它与这篇文章有关:如何将两个线性回归预测模型(每个数据帧的子集)合并成一个数据帧的一个集合



I将使用与那里使用的相同的数据,但使用一个新的列。
我创建数据:

  dat = read.table(text =猫鸟狼狼蛇
0 3 8 7 2
1 3 8 7 3
1 1 2 3 2
0 1 2 3 1
0 1 2 3 2
1 6 1 1 3
0 6 1 1 1
1 6 1 1 1,header = TRUE)

模拟狼数,使用两个子集的数据来区分条件。每个子集的方程是不同的。

  f0 = lm(wolfs〜snakes,data = dat,subset = dat $ cats == 0)
f1 = lm(wolfs〜snakes + trees,data = dat,subset = dat $ cats == 1)

预测每个子集的狼数。

  f0_predict = predict(f0,data = dat,subset = dat $ cats == 1,type ='response' b $ b f1_predict = predict(f1,data = dat,subset = dat $ cats == 0,type ='response')

然后(再次,根据2015年的帖子)我用cats变量分割数据。

  dat.l = split(dat,dat $ cats)
dat.l

...这里是一个有点棘手的地方。 2015年的帖子建议使用lapply将两组预测附加到数据集。但是,在这里,被访者的功能不会奏效,因为它假设回归方程基本相同。这是我的尝试(这是接近原来,只是调整):

  dat.l = lapply(dat.l,function x){
mod =

ifelse(dat $ cats == 0,lm(wolfs〜snakes,data = x),lm(wolfs〜snakes + trees,data = x)
x $ full_prediction = predict(mod,data = x,type ='response')
return(x)
})
unsplit(dat.l,dat $ cats)

有关最后几步的任何想法?我对于SO还是比较新的,而且是R的中间人,所以如果我没有像社区喜欢的那样,请轻轻一点。

解决方案

这是一个dplyr解决方案,构建了您引用的上一篇文章:

  library(dplyr)

#创建一个新列,为每个级别的cat定义lm公式
dat < - dat%>%mutate(formula = ifelse(cats == 0,wolfs〜snakes,
wolfs〜snakes + trees))

#构建模型,并查找cats的每个值的预测值
dat< - dat%>%group_by(cats)% >%
do({
mod< - lm(as.formula(。$ formula [1]),data =。)
pred< - predict(mod)
data.frame(。,pred)
})

> dat
资料来源:本地数据框架[8 x 7]
组:猫[2]
猫鸟狼狼蛇树公式pred
(int)(int)(int) int)(int)(chr)(dbl)
1 0 3 8 7 2 wolfs〜snakes 7.5789474
2 0 1 2 3 1狼 - 蛇2.6315789
3 0 1 2 3 2狼〜蛇2.6315789
4 0 6 1 1 1狼狼蛇蛇0.1578947
5 1 3 8 7 3狼〜蛇+树7.6800000
6 1 1 2 3 2狼〜蛇+树2.9600000
7 1 6 1 1 3狼 - 蛇+树0.8400000
8 1 6 1 1 1狼 - 蛇+树0.5200000


I am building atop a similar question asked and answered on SO one year ago. It relates to this post: how to merge two linear regression prediction models (each per data frame's subset) into one colmn of the data frame

I will use the same data as was used there, but with a new column. I create the data :

dat = read.table(text = " cats birds    wolfs     snakes     trees
0        3        8         7        2
1        3        8         7        3
1        1        2         3        2
0        1        2         3        1
0        1        2         3        2
1        6        1         1        3
0        6        1         1        1
1        6        1         1        1   " ,header = TRUE) 

Model the number of wolves, using two subsets of the data to distinguish between conditions. The equations are different for each subset.

f0 = lm(wolfs~snakes,data = dat,subset=dat$cats==0)
f1 = lm(wolfs~snakes + trees,data = dat,subset=dat$cats==1)

Predict the number of wolves for each subset.

f0_predict = predict(f0,data = dat,subset=dat$cats==1,type='response')
f1_predict = predict(f1,data = dat,subset=dat$cats==0,type='response')

Then (again, per the 2015 post) I split the data by the cats variable.

dat.l = split(dat, dat$cats)
dat.l 

... Here is where it gets a little tricky. The 2015 post suggested using lapply to attached the two sets of predictions to the data set. But, here, the respondent's function would not work, as it assumed both regression equations were essentially the same. Here's my attempt (it's close to the original, just tweaked):

dat.l = lapply(dat.l, function(x){
mod = 

ifelse(dat$cats==0,lm(wolfs~snakes,data=x),lm(wolfs~snakes+trees,data=x)) 
               x$full_prediction = predict(mod,data=x,type='response')
               return(x)
    })
    unsplit(dat.l, dat$cats) 

Any ideas regarding the last couple of steps? I am still relatively new to S.O., and am an intermediate with R, so please go gently if I have not posted precisely as the community prefers.

解决方案

Here's a dplyr solution, building off of the previous post you cited:

library(dplyr)

# create a new column defining the lm formula for each level of cats
dat <- dat %>% mutate(formula = ifelse(cats==0, "wolfs ~ snakes", 
        "wolfs ~ snakes + trees"))

# build model and find predicted values for each value of cats
dat <- dat %>% group_by(cats) %>%
    do({
        mod <- lm(as.formula(.$formula[1]), data = .)
        pred <- predict(mod)
        data.frame(., pred)
    })

> dat
Source: local data frame [8 x 7]
Groups: cats [2]
   cats birds wolfs snakes trees                formula      pred
  (int) (int) (int)  (int) (int)                  (chr)     (dbl)
1     0     3     8      7     2         wolfs ~ snakes 7.5789474
2     0     1     2      3     1         wolfs ~ snakes 2.6315789
3     0     1     2      3     2         wolfs ~ snakes 2.6315789
4     0     6     1      1     1         wolfs ~ snakes 0.1578947
5     1     3     8      7     3 wolfs ~ snakes + trees 7.6800000
6     1     1     2      3     2 wolfs ~ snakes + trees 2.9600000
7     1     6     1      1     3 wolfs ~ snakes + trees 0.8400000
8     1     6     1      1     1 wolfs ~ snakes + trees 0.5200000

这篇关于将两个回归预测模型(具有数据帧的子集)合并回数据帧(一列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆