使用dplyr的mutate函数返回分组数据框内的相对值 [英] Using dplyr's mutate function to return relative values within a grouped data frame

查看:145
本文介绍了使用dplyr的mutate函数返回分组数据框内的相对值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用dplyr的mutate函数来创建一个新变量,该变量根据ifelse语句的值提取现有变量的相对值。这是一个我想要实现的例子,这将有助于更好地说明问题:

  id from_date fobs to_date 
a 1999-01-05 0 1999-01-10
a 1999-01-10 0 1999-02-14
a 1999-02-14 1 2013-12-31
b 1999- 03-19 0 1999-03-25
b 1999-03-25 1 2013-12-31
c 1999-02-14 0 1999-02-15
c 1999-02-15 1 2013 -12-31

数据集按ID分组,我正在尝试分配给每个观察的to_date下一个观察值的fobs的值等于0的日期,以及2013-12-31,其值等于1。



这是我最近尝试的代码,这对我来说并不奏效,但是我希望充分表达我正在用dplyr软件包完成什么:



<%p $ p> qdat%>%group_by(id)%>%mutate(to_date = ifelse(fobs == 1,as.Date(2013-12-31),as 。日期(铅(qdat $ date))))

为什么值得,这些是运行该代码的结果:

  id from_date val fobs to_date 
1 a 1999-01-05 5 0 NA
2 a 1999-01-10 9 0 NA
3 a 1999-02-14 4 1 16070
4 b 1999-03-19 7 0 NA
5 b 1999-03-25 14 1 16070
6 c 1999-02-14 10 0 NA
7 c 1999-02-15 11 1 16070

我已经审查过在R(R-Bloggers)中进行更多数据处理的教学手册,R-Studio关于数据科学的语法和图形的演讲,其中介绍了dplyr的其他信息以及关于dplyr包和相关单元的其他stackoverflow问题一般来说,我还没有找到解决这个问题的办法。为了纪录,我也很新的R,所以我提前道歉,如果我忽略了对任何人来说似乎很明显的东西。

解决方案

尝试:

 库(dplyr)

qdat%>%group_by(id)%>%
mutate(to_date = lead(from_date,default = as.Date(2013-12-31 )))

您可以取出ifelse,因为最后一个将永远是默认你的例子



您可能需要运行 qdat $ from_date< - as.Date(qdat $ from_date) first。



注意:由于ifelse有一个奇怪的事情,你得到这个结果。从?ifelse:


ifelse()strip属性



这很重要,使用日期和因素


所以我们需要在ifelse调用后还原类。



首先通过将ifelse更改为正确的电话来修复原始代码:

  newqdat<  -  qdat%>%group_by(id)%>%
mutate(to_date = ifelse(fobs == 1,
as.Date(2013-12-31 ),
as.Date(lead(from_date))))

然后更改课程回到日期:

  class(newqdat $ to_date)<  - Date
newqdat


I am trying to use dplyr's mutate function to create a new variable that pulls in relative values of an existing variable based on the value of an ifelse statement. Here is an example of what I'm trying to achieve, which will hopefully better illustrate the problem:

id  from_date fobs     to_date
 a 1999-01-05    0  1999-01-10
 a 1999-01-10    0  1999-02-14
 a 1999-02-14    1  2013-12-31
 b 1999-03-19    0  1999-03-25
 b 1999-03-25    1  2013-12-31
 c 1999-02-14    0  1999-02-15
 c 1999-02-15    1  2013-12-31

The dataset is grouped by ID, and I'm trying to assign to each observation of the "to_date" the next observation's value of the from date where the value of "fobs" is equal to 0, and 2013-12-31 where the value is equal to 1.

This is the code I most recently tried, which isn't working for me, but I hope adequately expresses what I'm trying to accomplish with the dplyr package:

qdat %>% group_by(id) %>% mutate(to_date = ifelse(fobs == 1,as.Date("2013-12-31"),as.Date(lead(qdat$date)))) 

For what it's worth, these are the results of running that code:

  id  from_date val fobs to_date
1  a 1999-01-05   5    0      NA
2  a 1999-01-10   9    0      NA
3  a 1999-02-14   4    1   16070
4  b 1999-03-19   7    0      NA
5  b 1999-03-25  14    1   16070
6  c 1999-02-14  10    0      NA
7  c 1999-02-15  11    1   16070

I have reviewed the "Hands on dplyr tutorial for faster data manipulation in R" (R-Bloggers), R-Studio's presentation on "The Grammar and Graphics of Data Science," which features additional infomation on dplyr, and other stackoverflow questions about the dplyr package and relative cell references in general, but I have not yet found a way to solve this problem. For the record, I'm also very new to R, so I apologize in advance if I'm overlooking something that seems perfectly obvious to anyone else.

解决方案

Try:

library(dplyr)

qdat %>% group_by(id) %>%
         mutate(to_date = lead(from_date, default = as.Date("2013-12-31")))

You can take out the ifelse, as the last one will always be the default in your example. If not, see below.

You might have to run qdat$from_date <- as.Date(qdat$from_date) first.

Note: You were getting this result due to a weird thing from ifelse. From ?ifelse:

ifelse() strips attributes

This is important when working with Dates and factors

So we need to restore the class after the ifelse call.

First fix you original code by changing the ifelse to the correct call:

newqdat <- qdat %>% group_by(id) %>%
                    mutate(to_date = ifelse(fobs == 1,
                                            as.Date("2013-12-31"),
                                            as.Date(lead(from_date))))

And then change the class back to date:

class(newqdat$to_date) <- "Date"
newqdat

这篇关于使用dplyr的mutate函数返回分组数据框内的相对值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆