选择第一个观察数据并利用突变 [英] Select first observed data and utilize mutate

查看:106
本文介绍了选择第一个观察数据并利用突变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个数据问题,在这里我想要首先观察到 ob 得分得分每个个人 id ,并从上次观察到的分数中减去



要求第一次观察减去最后一次观察的问题是有时第一次观察数据丢失。



有没有要求每个人的第一个观察到的分数,因此跳过任何丢失的数据?



我建立了以下df来说明我的问题。

 帮助<  -  data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
ob = c(1,2, 3,4,5,1,2,3,1,2,1,2,3),
得分= c(NA,2,3,4,3,7,3,4,3,4 ,NA,1,4))

id ob score
1 5 1 NA
2 5 2 2
3 5 3 3
4 5 4 4
5 5 5 3
6 12 1 7
7 12 2 3
8 12 3 4
9 17 1 3
10 17 2 4
11 20 1 NA
12 20 2 1
13 20 3 4

我希望运行的是代码,将给我...

  id ob score es 
1 5 1 NA -1
2 5 2 2 -1
3 5 3 3 -1
4 5 4 4 -1
5 5 5 3 -1
6 12 1 7 3
7 12 2 3 3
8 12 3 4 3
9 17 1 3 -1
10 17 2 4 -1
11 20 1 NA -3
12 20 2 1 -3
13 20 3 4 -3

我正试图从dplyr和我明白使用'group_by'命令,但是,不知道如何仅选择第一个观察到的分数,然后突变创建 es

解决方案

我将使用 first() last() code>(均为 dplyr 函数)和 na.omit()(从默认stats包中。



首先,我将确保您的分数列是具有适当NA值的数字列(不在您的示例中的字符串)

  help<  -  data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20, 20),
ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
score = c(NA,2, 3,4,3,7,3,4,3,4,NA,1,4))

然后你可以做

  library(dplyr)
help%>%group_by(id)%> %安排(ob)%>%
mutate(es = first(na.omit(score)-last(na.omit(score))))
pre>

I am running into an issue with my data where I want to take the first observed ob score score for each individual id and subtract that from that last observed score.

The problem with asking for the first observation minus the last observation is that sometimes the first observation data is missing.

Is there anyway to ask for the first observed score for each individual, thus skipping any missing data?

I built the below df to illustrate my problem.

help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
                   ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
                   score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4))

   id ob score
1   5  1    NA
2   5  2     2
3   5  3     3
4   5  4     4
5   5  5     3
6  12  1     7
7  12  2     3
8  12  3     4
9  17  1     3
10 17  2     4
11 20  1    NA
12 20  2     1
13 20  3     4

And what I am hoping to run is code that will give me...

   id ob score  es
1   5  1    NA  -1
2   5  2     2  -1
3   5  3     3  -1
4   5  4     4  -1
5   5  5     3  -1
6  12  1     7   3
7  12  2     3   3
8  12  3     4   3
9  17  1     3  -1
10 17  2     4  -1
11 20  1    NA  -3
12 20  2     1  -3
13 20  3     4  -3

I am attempting to work out of dplyr and I understand the use of the 'group_by' command, however, not sure how to 'select' only first observed scores and then mutate to create es.

解决方案

I would use first() and last() (both dplyr function) and na.omit() (from the default stats package.

First, I would make sure your score column was a numberic column with proper NA values (not strings as in your example)

help <- data.frame(id = c(5,5,5,5,5,12,12,12,17,17,20,20,20),
       ob = c(1,2,3,4,5,1,2,3,1,2,1,2,3),
       score = c(NA, 2, 3, 4, 3, 7, 3, 4, 3, 4, NA, 1, 4))

then you can do

library(dplyr)
help %>% group_by(id) %>% arrange(ob) %>% 
    mutate(es=first(na.omit(score)-last(na.omit(score))))

这篇关于选择第一个观察数据并利用突变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆