对行使用带有重复标识符的传播 [英] Using spread with duplicate identifiers for rows
问题描述
我有一个长格式的数据框,其中包含针对同一日期和同一个人的多个条目.
I have a long form dataframe that have multiple entries for same date and person.
jj <- data.frame(month=rep(1:3,4),
student=rep(c("Amy", "Bob"), each=6),
A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))
我想把它转换成宽格式并做成这样:
I want to convert it to wide form and make it like this:
month Amy.A Bob.A Amy.B Bob.B
1
2
3
1
2
3
1
2
3
1
2
3
我的问题与this非常相似.我在答案中使用了给定的代码:
My question is very similar to this. I have used the given code in the answer :
kk <- jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
spread(temp, value)
但它给出了以下错误:
错误:行 (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10) 的标识符重复, (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)
Error: Duplicate identifiers for rows (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10), (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)
提前致谢.注意:我不想删除多个条目.
Thanks in advance. Note: I don't want to delete multiple entries.
推荐答案
问题是 A
和 B
的两列.如果我们可以创建一个值列,我们就可以按照您的意愿传播数据.使用下面的代码时,请查看 jj_melt
的输出.
The issue is the two columns for both A
and B
. If we can make that one value column, we can spread the data as you would like. Take a look at the output for jj_melt
when you use the code below.
library(reshape2)
jj_melt <- melt(jj, id=c("month", "student"))
jj_spread <- dcast(jj_melt, month ~ student + variable, value.var="value", fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
我不会将此标记为重复,因为另一个问题没有通过 sum
进行总结,但是 data.table
答案可以帮助解决一个额外的参数,<代码>乐趣=总和代码>:
I won't mark this as a duplicate since the other question did not summarize by sum
, but the data.table
answer could help with one additional argument, fun=sum
:
library(data.table)
dcast(setDT(jj), month ~ student, value.var=c("A", "B"), fun=sum)
# month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
# 1: 1 17 8 11 8
# 2: 2 13 8 13 5
# 3: 3 15 6 15 11
如果你想使用tidyr
解决方案,结合dcast
以sum
总结.
If you would like to use the tidyr
solution, combine it with dcast
to summarize by sum
.
as.data.frame(jj)
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
dcast(month ~ temp, fun=sum)
# month Amy_A Amy_B Bob_A Bob_B
# 1 1 17 11 8 8
# 2 2 13 13 8 5
# 3 3 15 15 6 11
编辑
根据您的新要求,我添加了活动栏.
Based on your new requirements, I have added an activity column.
library(dplyr)
jj %>% group_by(month, student) %>%
mutate(id=1:n()) %>%
melt(id=c("month", "id", "student")) %>%
dcast(... ~ student + variable, value.var="value")
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 1 2 8 5 5 3
# 3 2 1 7 7 2 4
# 4 2 2 6 6 6 1
# 5 3 1 6 8 1 6
# 6 3 2 9 7 5 5
也可以使用其他解决方案.这里我添加了一个可选的表达式来按活动编号排列最终输出:
The other solutions can also be used. Here I added an optional expression to arrange the final output by activity number:
library(tidyr)
jj %>%
gather(variable, value, -(month:student)) %>%
unite(temp, student, variable) %>%
group_by(temp) %>%
mutate(id=1:n()) %>%
dcast(... ~ temp) %>%
arrange(id)
# month id Amy_A Amy_B Bob_A Bob_B
# 1 1 1 9 6 3 5
# 2 2 2 7 7 2 4
# 3 3 3 6 8 1 6
# 4 1 4 8 5 5 3
# 5 2 5 6 6 6 1
# 6 3 6 9 7 5 5
data.table
语法很紧凑,因为它允许多个 value.var
列并且会为我们处理传播.然后我们可以跳过 melt ->转换
过程.
The data.table
syntax is compact because it allows for multiple value.var
columns and will take care of the spread for us. We can then skip the melt -> cast
process.
library(data.table)
setDT(jj)[, activityID := rowid(student)]
dcast(jj, ... ~ student, value.var=c("A", "B"))
# month activityID A_Amy A_Bob B_Amy B_Bob
# 1: 1 1 9 3 6 5
# 2: 1 4 8 5 5 3
# 3: 2 2 7 2 7 4
# 4: 2 5 6 6 6 1
# 5: 3 3 6 1 8 6
# 6: 3 6 9 5 7 5
这篇关于对行使用带有重复标识符的传播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!