使用带有重复标识符的扩展 [英] Using spread with duplicate identifiers for rows

查看:160
本文介绍了使用带有重复标识符的扩展的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个长格式数据帧,具有相同日期和人员的多个条目。

  jj<  -  data.frame(month = rep(1:3,4),
student = rep (c(Amy,Bob),每个= 6),
A = c(9,7,6,8,6,9,3,2,1,5,6,5)
B = c(6,7,8,5,6,7,5,4,6,3,1,5))

我想将其转换为广泛的形式,使之如下:

 月Amy.A Bob.A Amy.B Bob.B 
1
2
3
1
2
3
1
2
3
1
2
3

我的问题非常类似于。我在答案中使用了给定的代码:

  kk<  -  jj%>%
gather ,value, - (month:student))%>%
unite(temp,student,variable)%>%
spread(temp,value)

但它会产生以下错误:


错误:重复行(1,4),(2,5),(3,6),(13,16),(14,17),(15,18),(7,10),(8,11) ,(9,12),(19,22),(20,23),(21,24)



注意:我不想删除多个条目。

解决方案

问题是两个 A B 。如果我们可以制作一个值列,我们可以按需要传播数据。当您使用以下代码时,请查看 jj_melt 的输出。

  library(reshape2)
jj_melt< - melt(jj,id = c(month,student))
jj_spread< - dcast(jj_melt,month〜student + variable,value .var =value,fun = sum)
#month Amy_A Amy_B Bob_A Bob_B
#1 1 17 11 8 8
#2 2 13 13 8 5
#3 3 15 15 6 11

我不会将此标记为重复,因为另一个问题没有总结 sum ,但 data.table 答案可以帮助一个额外的参数, fun = sum

  library(data.table)
dcast(setDT(jj),month 〜student,value.var = c(A,B),fun = sum)
#month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
#1:1 17 8 11 8
#2 :2 13 8 13 5
#3:3 15 6 15 11

如果您想使用 tidyr 解决方案,将其与 dcast 结合起来,总结为 sum



$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b gather month month month month month month month month month month month month month month month month month month month month month month :student))%>%
unite(temp,student,variable)%>%
dcast(month〜temp,fun = sum)
#month Amy_A Amy_B Bob_A Bob_B
#1 1 17 11 8 8
#2 2 13 13 8 5
#3 3 15 15 6 11

修改



根据您的新要求,我添加了一个活动列。

  library(dplyr)
jj%>%group_by(month,student)%>%
mutate = 1:n())%>%
melt(id = c(month,id,student))%>%
dcast(...〜student +变量value.var =value)
#month id Amy_A Amy_B Bob_A Bob_B
#1 1 1 9 6 3 5
#2 1 2 8 5 5 3
#3 2 1 7 7 2 4
#4 2 2 6 6 6 1
#5 3 1 6 8 1 6
#6 3 2 9 7 5 5

其他解决方案也可以使用。这里我添加了一个可选的表达式,以按活动编号排列最终输出:

  library(tidyr)
jj%> ;%
collect(variable,value, - (month:student))%>%
unite(temp,student,variable)%>%
group_by(temp)%> %
mutate(id = 1:n())%>%
dcast(...〜temp)%>%
arrange(id)
# Amy_A Amy_B Bob_A Bob_B
#1 1 1 9 6 3 5
#2 2 2 7 7 2 4
#3 3 3 6 8 1 6
#4 1 4 8 5 5 3
#5 2 5 6 6 6 1
#6 3 6 9 7 5 5

data.table 语法是紧凑的,因为它允许多个 value.var 列,并将采用照顾我们的传播。然后,我们可以跳过 melt - >

$ b

  library(data.table)
setDT(jj)[, activityID = = rowid(student)]
dcast(jj,...〜student,value.var = c(A,B))
#month activityID A_Amy A_Bob B_Amy B_Bob
#1:1 1 9 3 6 5
#2:1 4 8 5 5 3
#3:2 2 7 2 7 4
#4:2 5 6 6 6 1
#5:3 3 6 1 8 6
#6:3 6 9 5 7 5


I have a long form dataframe that have multiple entries for same date and person.

jj <- data.frame(month=rep(1:3,4),
             student=rep(c("Amy", "Bob"), each=6),
             A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5),
             B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5))

I want to convert it to wide form and make it like this:

month Amy.A Bob.A Amy.B Bob.B
1     
2     
3
1
2
3
1
2
3
1
2
3

My question is very similar to this. I have used the given code in the answer :

kk <- jj %>% 
  gather(variable, value, -(month:student)) %>% 
  unite(temp, student, variable) %>% 
  spread(temp, value)

but it gives following error:

Error: Duplicate identifiers for rows (1, 4), (2, 5), (3, 6), (13, 16), (14, 17), (15, 18), (7, 10), (8, 11), (9, 12), (19, 22), (20, 23), (21, 24)

Thanks in advance. Note: I don't want to delete multiple entries.

解决方案

The issue is the two columns for both A and B. If we can make that one value column, we can spread the data as you would like. Take a look at the output for jj_melt when you use the code below.

library(reshape2)
jj_melt <- melt(jj, id=c("month", "student"))
jj_spread <- dcast(jj_melt, month ~ student + variable, value.var="value", fun=sum)
#   month Amy_A Amy_B Bob_A Bob_B
# 1     1    17    11     8     8
# 2     2    13    13     8     5
# 3     3    15    15     6    11

I won't mark this as a duplicate since the other question did not summarize by sum, but the data.table answer could help with one additional argument, fun=sum:

library(data.table)
dcast(setDT(jj), month ~ student, value.var=c("A", "B"), fun=sum)
#    month A_sum_Amy A_sum_Bob B_sum_Amy B_sum_Bob
# 1:     1        17         8        11         8
# 2:     2        13         8        13         5
# 3:     3        15         6        15        11

If you would like to use the tidyr solution, combine it with dcast to summarize by sum.

as.data.frame(jj)
library(tidyr)
jj %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  dcast(month ~ temp, fun=sum)
#   month Amy_A Amy_B Bob_A Bob_B
# 1     1    17    11     8     8
# 2     2    13    13     8     5
# 3     3    15    15     6    11

Edit

Based on your new requirements, I have added an activity column.

library(dplyr)
jj %>% group_by(month, student) %>% 
  mutate(id=1:n()) %>%
  melt(id=c("month", "id", "student")) %>%
  dcast(... ~ student + variable, value.var="value")
#   month id Amy_A Amy_B Bob_A Bob_B
# 1     1  1     9     6     3     5
# 2     1  2     8     5     5     3
# 3     2  1     7     7     2     4
# 4     2  2     6     6     6     1
# 5     3  1     6     8     1     6
# 6     3  2     9     7     5     5

The other solutions can also be used. Here I added an optional expression to arrange the final output by activity number:

library(tidyr)
jj %>% 
  gather(variable, value, -(month:student)) %>%
  unite(temp, student, variable) %>%
  group_by(temp) %>%
  mutate(id=1:n()) %>%
  dcast(... ~ temp) %>%
  arrange(id)
#   month id Amy_A Amy_B Bob_A Bob_B
# 1     1  1     9     6     3     5
# 2     2  2     7     7     2     4
# 3     3  3     6     8     1     6
# 4     1  4     8     5     5     3
# 5     2  5     6     6     6     1
# 6     3  6     9     7     5     5

The data.table syntax is compact because it allows for multiple value.var columns and will take care of the spread for us. We can then skip the melt -> cast process.

library(data.table)
setDT(jj)[, activityID := rowid(student)]
dcast(jj, ... ~ student, value.var=c("A", "B"))
#    month activityID A_Amy A_Bob B_Amy B_Bob
# 1:     1          1     9     3     6     5
# 2:     1          4     8     5     5     3
# 3:     2          2     7     2     7     4
# 4:     2          5     6     6     6     1
# 5:     3          3     6     1     8     6
# 6:     3          6     9     5     7     5

这篇关于使用带有重复标识符的扩展的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆