将dplyr :: do()与dplyr :: mutate组合? [英] Combining dplyr::do() with dplyr::mutate?

查看:193
本文介绍了将dplyr :: do()与dplyr :: mutate组合?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要实现以下几点:对于数据集的每个子组,我想进行一个回归,该回归的残差应该保存为原始数据框中的一个新变量。例如,

  group_by(mtcars,gear)%>%mutate(res = residuals(lm(mpg〜carb, ))

表示我认为应该工作,但没有(任何人都在意解释为什么它不行?)。获得残差的一种方法是执行以下操作:

  group_by(mtcars,gear)%>%do(res =剩余(lm(mpg〜carb,。)))

这给了我一个数据框,其中 dbl - 保存对象,即包含每个组的残差。但是,它似乎不包含原始的rownames,这将帮助我们将它们合并回原始数据。



所以,我的问题是:如何以一种dplyr的方式实现我想做的?



显然,它可以以其他方式实现。为了给你一个例子,以下工作很好:

  dat<  -  mtcars 
dat $ res< - NA
(i in unique(mtcars $ gear)){
dat [dat $ gear == i,res]< - 残差(lm(mpg〜disp,data = dat [ dat $ gear == i,]))
}

但是,我的理解是 dplyr 是为了这个目的,所以应该有一个 dplyr -style的方式?



任何提示/提示/意见都不胜感激。



备注:这个问题非常类似于 lm()在mutate()中调用,除了在该问题中,每个组中只保留一个参数,这使得合并 -approach容易。我有一个没有rownames的整个向量,所以我不得不依靠向量的顺序来做,这对我来说似乎很麻烦。

解决方案

  library(lazyeval)
eq< - y〜x
dat< - mtcars
dat%> ;%
group_by(gear)%>%
mutate(res = residuals(lm(interp(eq,y = mpg,x = disp))))

或不含 lazyeval

  dat%>%
group_by(gear)%>%
mutate(res = residuals(lm(deparse(substitute(mpg〜disp))) )


I would like to achieve the following: for each subgroup of a dataset, I would like to carry out a regression, and the residuals of that regression should be saved as a new variable in the original dataframe. For instance,

 group_by(mtcars, gear) %>% mutate(res = residuals(lm(mpg~carb, .)))

indicates what I think should work, but does not (anyone care to explain why it does not work?). One way to get the residuals is to do the following:

 group_by(mtcars, gear) %>% do(res = residuals(lm(mpg~carb, .)))

which gives me a dataframe in which dbl-objects are saved, i.e. those contain the residuals for each group. However, it seems they do not contain the original rownames that would help me to merge them back to the original data.

So, my question is: how can I achieve what I want to do in a dplyr-kind of way?

Obviously, it can be achieved in other ways. To give you an example, the following works just fine:

 dat <- mtcars
 dat$res <- NA
 for(i in unique(mtcars$gear)){
   dat[dat$gear==i, "res"]  <- residuals(lm(mpg ~ disp, data=dat[dat$gear==i,]))
 }

However, my understanding is that dplyr is made for this purpose, so there should be a dplyr-style way?

Any hints / tips / comments are appreciated.

Remark: this question is very similar to lm() called within mutate() except that in that question, only one parameter per group is retained, which makes a merge-approach easy. I have an entire vector with no rownames, so that I would have to rely on the ordering of the vector to do that, and that seems troublesome to me.

解决方案

library(lazyeval)
eq <- "y ~ x"
dat <- mtcars
dat %>% 
    group_by(gear) %>% 
    mutate(res=residuals(lm(interp(eq, y = mpg, x = disp))))

or without lazyeval

dat %>% 
    group_by(gear) %>% 
    mutate(res=residuals(lm(deparse(substitute(mpg~disp)))))

这篇关于将dplyr :: do()与dplyr :: mutate组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆