如何从分组数据中的最后一个条目中减去第一个条目 [英] How to subtract first entry from last entry in grouped data

查看:136
本文介绍了如何从分组数据中的最后一个条目中减去第一个条目的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望对以下任务有一些帮助:从下面的数据框( C ),对于每个id,我想减去列 d_2 ,然后将结果存储在包含相同ID的另一个数据帧中。我可以将其与我的初始数据帧进行合并。请注意,减法必须按照这个顺序(最后一个条目减去每个 id 的第一个条目)。



以下是代码:

  id < -  c在A1,A1,B10,B10,B500,B500,C100,C100,C100,D40,D40,G100 )

d_1< - c(rep(1.15,2),rep(1.44,2),rep(1.34,2),rep(1.50,3))rep(1.90,2) ,rep(1.59,2))

set.seed(2)

d_2< - round(runif(13,-1,1),2)

C< - data.frame(id,d_1,d_2)

id d_1 d_2
A1 1.15 -0.63
A1 1.15 0.40
B10 1.44 0.15
B10 1.44 -0.66
B500 1.34 0.89
B500 1.34 0.89
C100 1.50 -0.74
C100 1.50 0.67
C100 1.50 -0.06
D40 1.90 0.10
D40 1.90 0.11
G100 1.59 -0.52
G100 1.59 0.52

所需结果:

  id2 < -  c(A1,B10,B500 ,C100,D40,G100)

差异<-C(1.03,-0.81,0,0.68,0.01,1.04)

diff_df< ; - data.frame(id2,差异)

id2差异
A1 1.03
B 10 -0.81
B500 0.00
C100 0.68
D40 0.01
G100 1.04

我尝试通过使用 ddply 来获取第一个和最后一个条目,但我真的很努力索引第二个代码中的函数参数(以下)以获得所需的结果。

  C_1 < -  ddply(C,。(id),function(x)x [c(1,nrow ))])

ddply(C_1,。(patient),function)



说实话,我不太熟悉ddply包 - 我从另一个 post



我的原始数据是一个分组数据,我认为另一种方法是使用 gapply ,但是我再次遇到第三个参数(通常是一个函数)

  grouping_C<  -  groupingData(d_1〜d_2 | id,data = C,FUN = mean,labels = list(x =,y = ),units = list())

x1< - gapply(grouping_C,d_2,first_entry)

x2< - gapply(grouping_C, d_2,last_entry)

其中first_entry和last_entry是帮助我获得第一个和st条目。
然后我可以得到以下区别: x2 - x1 。但是,我不知道在上面的代码中输入的是first_entry和last_entry(可能是用head或tail做的)。



任何帮助都将不胜感激

解决方案

这可以通过 dplyr 轻松完成。 last first 函数对此任务非常有帮助。

  library(dplyr)#install包dplyr并将其加载到库

diff_df< - C%>%#create一个新的数据框架(diff_df)并在其中存储以下操作的输出。 %。%运算符用于将多个操作链接在一起,但您不必引用每次使用的数据框。所以这里我们正在使用你的data.frame C进行以下步骤
group_by(id)%>%#group整个data.frame C由id
总结(差异=最后(d_2)-first (d_2))#对于每组ID,创建一个单行总结,从该组的d_2的最后一个条目中减去d_2(对该组)的第一个条目

#id差异#这是结果存储在diff_df
#1 A1 1.03
#2 B10 -0.81
#3 B500 0.00
#4 C100 0.68
#5 D40 0.01
#6 G100 1.04






编辑备注:已更新以%>%发布,而不是%。%不推荐使用。


I would appreciate some help with the following task: From the data frame below (C), for each id I would like to subtract the first entry under column d_2 from the final entry and then store the results in another dataframe containing the same ids. I can then merge this with my initial dataframe. Pls note that the subtraction has to be in this order (last entry minus first entry for each id).

Here are the codes:

id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")

d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))

set.seed(2)

d_2 <- round(runif(13, -1, 1), 2)

C <- data.frame(id, d_1, d_2)

id   d_1   d_2
A1   1.15 -0.63
A1   1.15  0.40
B10  1.44  0.15
B10  1.44 -0.66
B500 1.34  0.89
B500 1.34  0.89
C100 1.50 -0.74
C100 1.50  0.67
C100 1.50 -0.06
D40  1.90  0.10
D40  1.90  0.11
G100 1.59 -0.52
G100 1.59  0.52

Desired result:

id2 <- c("A1", "B10", "B500", "C100", "D40", "G100")

difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04)

diff_df <- data.frame(id2, difference)

id2    difference
A1        1.03
B10      -0.81
B500      0.00
C100      0.68
D40       0.01
G100      1.04

I attempted this by using ddply to obtain the first and last entries but I'm really struggling with indexing the "function argument" in the second code (below) to get the desired outcome.

C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ])

ddply(C_1, .(patient), function )

To be honest, I'm not very familiar with the ddply package-I got the code above from another post on stack exchange .

My original data is a groupedData and I believe another way of approaching this is using gapply but again I'm struggling with the third argument here (usually a function)

grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list(""))

x1 <- gapply(grouped_C, "d_2", first_entry)

x2 <- gapply(grouped_C, "d_2", last_entry)

where first_entry and last_entry are functions to help me get the first and and last entries. I can then get the difference with: x2 - x1. However, I'm not sure what to input as first_entry and last_entry in the above codes (perhaps to do with head or tail ?).

Any help would be much appreciated.

解决方案

This can be done easily with dplyr. The last and first functions are very helpful for this task.

library(dplyr)               #install the package dplyr and load it into library 

diff_df <- C %>%             #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps
  group_by(id) %>%            #group the whole data.frame C by id
  summarize(difference = last(d_2)-first(d_2))     #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group

#    id difference             #this is the result stored in diff_df
#1   A1       1.03
#2  B10      -0.81
#3 B500       0.00
#4 C100       0.68
#5  D40       0.01
#6 G100       1.04


Edit note: updated post with %>% instead of %.% which is deprecated.

这篇关于如何从分组数据中的最后一个条目中减去第一个条目的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆