使用带有重复标识符的扩展 [英] Using spread with duplicate identifiers for rows
问题描述
我有一个长格式数据帧,具有相同日期和人员的多个条目。
jj< - data.frame(month = rep(1:3,4),
student = rep (c(Amy,Bob),每个= 6),
A = c(9,7,6,8,6,9,3,2,1,5,6,5)
B = c(6,7,8,5,6,7,5,4,6,3,1,5))
我想将其转换为广泛的形式,使之如下:
月Amy.A Bob.A Amy.B Bob.B
1
2
3
1
2
3
1
2
3
1
2
3
我的问题非常类似于此。我在答案中使用了给定的代码:
kk< - jj%>%
gather ,value, - (month:student))%>%
unite(temp,student,variable)%>%
spread(temp,value)
但它会产生以下错误:
错误:重复行(1,4),(2,5),(3,6),(13,16),(14,17),(15,18),(7,10),(8,11) ,(9,12),(19,22),(20,23),(21,24)
。
注意:我不想删除多个条目。
问题是两个 A
和 B
。如果我们可以制作一个值列,我们可以按需要传播数据。当您使用以下代码时,请查看 jj_melt
的输出。
library(reshape2)
jj_melt< - melt(jj,id = c(month,student))
jj_spread< - dcast(jj_melt,month〜student + variable,value .var =value,fun = sum)
#month Amy_A Amy_B Bob_A Bob_B
#1 1 17 11 8 8
#2 2 13 13 8 5
#3 3 15 15 6 11
我不会将此标记为重复,因为另一个问题没有总结 sum
,但 data.table
答案可以帮助一个额外的参数, fun = sum
:
library(data.table)
dcast(setDT(jj),month 〜student,value.var = c(A,B),fun = sum)
#month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
#1:1 17 8 11 8
#2 :2 13 8 13 5
#3:3 15 6 15 11
如果您想使用 tidyr
解决方案,将其与 dcast
结合起来,总结为 sum
。
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b gather month month month month month month month month month month month month month month month month month month month month month month :student))%>%
unite(temp,student,variable)%>%
dcast(month〜temp,fun = sum)
#month Amy_A Amy_B Bob_A Bob_B
#1 1 17 11 8 8
#2 2 13 13 8 5
#3 3 15 15 6 11
修改
根据您的新要求,我添加了一个活动列。
library(dplyr)
jj%>%group_by(month,student)%>%
mutate = 1:n())%>%
melt(id = c(month,id,student))%>%
dcast(...〜student +变量value.var =value)
#month id Amy_A Amy_B Bob_A Bob_B
#1 1 1 9 6 3 5
#2 1 2 8 5 5 3
#3 2 1 7 7 2 4
#4 2 2 6 6 6 1
#5 3 1 6 8 1 6
#6 3 2 9 7 5 5
其他解决方案也可以使用。这里我添加了一个可选的表达式,以按活动编号排列最终输出:
library(tidyr)
jj%> ;%
collect(variable,value, - (month:student))%>%
unite(temp,student,variable)%>%
group_by(temp)%> %
mutate(id = 1:n())%>%
dcast(...〜temp)%>%
arrange(id)
# Amy_A Amy_B Bob_A Bob_B
#1 1 1 9 6 3 5
#2 2 2 7 7 2 4
#3 3 3 6 8 1 6
#4 1 4 8 5 5 3
#5 2 5 6 6 6 1
#6 3 6 9 7 5 5
data.table
语法是紧凑的,因为它允许多个 value.var
列,并将采用照顾我们的传播。然后,我们可以跳过 melt - >
$ b
library(data.table)
setDT(jj)[, activityID = = rowid(student)]
dcast(jj,...〜student,value.var = c(A,B))
#month activityID A_Amy A_Bob B_Amy B_Bob
#1:1 1 9 3 6 5
#2:1 4 8 5 5 3
#3:2 2 7 2 7 4
#4:2 5 6 6 6 1
#5:3 3 6 1 8 6
#6:3 6 9 5 7 5
I have a long form dataframe that have multiple entries for same date and person.
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
I want to convert it to wide form and make it like this:
month Amy.A Bob.A Amy.B Bob.B
1
2
3
1
2
3
1
2
3
1
2
3
My question is very similar to this. I have used the given code in the answer :
kk <- jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
but it gives following error:
Error: Duplicate identifiers for rows (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10), (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)
Thanks in advance. Note: I don't want to delete multiple entries.
The issue is the two columns for both A
and B
. If we can make that one value column, we can spread the data as you would like. Take a look at the output for jj_melt
when you use the code below.
library(reshape2)
jj_melt <- melt(jj, id=c("month", "student"))
jj_spread <- dcast(jj_melt, month ~ student + variable, value.var="value", fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
I won't mark this as a duplicate since the other question did not summarize by sum
, but the data.table
answer could help with one additional argument, fun=sum
:
library(data.table)
dcast(setDT(jj), month ~ student, value.var=c("A", "B"), fun=sum)
# month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
# 1: 1 17 8 11 8
# 2: 2 13 8 13 5
# 3: 3 15 6 15 11
If you would like to use the tidyr
solution, combine it with dcast
to summarize by sum
.
as.data.frame(jj)
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
dcast(month ~ temp, fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
Edit
Based on your new requirements, I have added an activity column.
library(dplyr)
jj %>% group_by(month, student) %>%
mutate(id=1:n()) %>%
melt(id=c("month", "id", "student")) %>%
dcast(... ~ student + variable, value.var="value")
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 1 2 8 5 5 3
# 3 2 1 7 7 2 4
# 4 2 2 6 6 6 1
# 5 3 1 6 8 1 6
# 6 3 2 9 7 5 5
The other solutions can also be used. Here I added an optional expression to arrange the final output by activity number:
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
dcast(... ~ temp) %>%
arrange(id)
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 2 2 7 7 2 4
# 3 3 3 6 8 1 6
# 4 1 4 8 5 5 3
# 5 2 5 6 6 6 1
# 6 3 6 9 7 5 5
The data.table
syntax is compact because it allows for multiple value.var
columns and will take care of the spread for us. We can then skip the melt -> cast
process.
library(data.table)
setDT(jj)[, activityID := rowid(student)]
dcast(jj, ... ~ student, value.var=c("A", "B"))
# month activityID A_Amy A_Bob B_Amy B_Bob
# 1: 1 1 9 3 6 5
# 2: 1 4 8 5 5 3
# 3: 2 2 7 2 7 4
# 4: 2 5 6 6 6 1
# 5: 3 3 6 1 8 6
# 6: 3 6 9 5 7 5
这篇关于使用带有重复标识符的扩展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!