通过创建基于组条件的重复来展开data.frame [英] Expand data.frame by creating duplicates based on group condition
问题描述
这里是我的data.frame的例子:
df = read.table(text ='ID Day Count Count_group
1001 1933 6 11
1002 1933 6 11
1003 1933 6 11
1004 1933 6 11
1005 1933 6 11
1006 1933 6 11
1007 1932 5 8
1008 1932 5 8
1009 1932 5 8
1010 1932 5 8
1011 1932 5 8
1012 1931 3 4
1013 1931 3 4
1014 1931 3 4
1015 1930 1 1
1016 1800 6 10
1017 1800 6 10
1018 1800 6 10
1019 1800 6 10
1020 1800 6 10
1021 1800 6 10
1022 1799 4 6
1023 1799 4 6
1024 1799 4 6
1025 1799 4 6
1026 1798 2 2
1027 1798 2 2
1028 888 4 6
1029 888 4 6
1030 888 4 6
1031 888 4 6
1032 887 2 3
103 3 887 2 3
1034 886 1 2
1035 885 1 1',header = TRUE)
计数
col显示每个日的总数
和 ID
Count_group
col显示每个<$ c $的 ID
值的总和c> Day 和 Day - 1
。
eg 1933 = Count_group
11,因为计数
6(1933)+ 计数
5(1932),等等。
我需要做的是每个 Count_group
并将它们添加到它,以便显示每个 Count_group
其日
AND Day - 1
。
eg Count_group
= 11由计数
的值 Day
1933年和1932年。所以两天都需要包含在$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ...
预期输出:
ID日数Count_group
1001 1933 6 11
1002 1933 6 11
1003 1933 6 11
1004 1933 6 11
1005 1933 6 11
1006 1933 6 11
1007 1932 5 11
1008 1932 5 11
1009 1932 5 11
1010 1932 5 11
1011 1932 5 11
1007 1932 5 8
1008 1932 5 8
1009 1932 5 8
1010 1932 5 8
1011 1932 5 8
1012 1931 3 8
1013 1931 3 8
1014 1931 3 8
1012 1931 3 4
1013 1931 3 4
1014 1931 3 4
1015 1930 1 4
1015 19 30 1 1
1016 1800 6 10
1017 1800 6 10
1018 1800 6 10
1019 1800 6 10
1020 1800 6 10
1021 1800 6 10
1022 1799 4 10
1023 1799 4 10
1024 1799 4 10
1025 1799 4 10
1022 1799 4 6
1023 1799 4 6
1024 1799 4 6
1025 1799 4 6
1026 1798 2 6
1027 1798 2 6
1026 1798 2 2
1027 1798 2 2
1028 888 4 6
1029 888 4 6
1030 888 4 6
1031 888 4 6
1032 887 2 6
1033 887 2 6
1032 887 2 3
1033 887 2 3
1034 886 1 3
1034 886 1 2
1035 885 1 2
1035 885 1 1
$你有什么建议吗?解决方案我认为这样做你需要...
#first添加分组变量
df $ daygroup< - c(0,cumsum(sapply(2:nrow(df)),function(i)df $ Day [i]! = df $ Day [i-1])))
#split df到数据框的列表中,循环通过它们添加额外的行,
#并将它们绑在一起
df2 < - do.call(rbind,lapply(split(df,df $ daygroup),function(x){
n < - nrow(x)
m < - x $ Count_group [ 1]#day
所需的行数n if(m> n){
y < - rbind(x,data.frame(ID =(x $ ID [n] +1)):( x $ ID [n] + mn),#continue numbering
Day = x $ Day [1] -1,#previous day
Count = mx $ Count [1],#difference in count
Count_group = m,
daygroup = x $ daygroup [1]))
} else {
y< - x #no需要额外的行
}
return y)
}
))
df2 $ daygroup< - NULL #remove分组变量
头(df2,20)#ignore rownames !
ID日数Count_group
0.1 1001 1933 6 11
0.2 1002 1933 6 11
0.3 1003 1933 6 11
0.4 1004 1933 6 11
0.5 1005 1933 6 11
0.6 1006 1933 6 11
0.7 1007 1932 5 11
0.8 1008 1932 5 11
0.9 1009 1932 5 11
0.10 1010 1932 5 11
0.11 1011 1932 5 11
1.7 1007 1932 5 8
1.8 1008 1932 5 8
1.9 1009 1932 5 8
1.10 1010 1932 5 8
1.11 1011 1932 5 8
1.1 1012 1931 3 8
1.2 1013 1931 3 8
1.3 1014 1931 3 8
2.12 1012 1931 3 4
/ pre>
Here an example of my data.frame:
df = read.table(text = 'ID Day Count Count_group
1001 1933 6 11
1002 1933 6 11
1003 1933 6 11
1004 1933 6 11
1005 1933 6 11
1006 1933 6 11
1007 1932 5 8
1008 1932 5 8
1009 1932 5 8
1010 1932 5 8
1011 1932 5 8
1012 1931 3 4
1013 1931 3 4
1014 1931 3 4
1015 1930 1 1
1016 1800 6 10
1017 1800 6 10
1018 1800 6 10
1019 1800 6 10
1020 1800 6 10
1021 1800 6 10
1022 1799 4 6
1023 1799 4 6
1024 1799 4 6
1025 1799 4 6
1026 1798 2 2
1027 1798 2 2
1028 888 4 6
1029 888 4 6
1030 888 4 6
1031 888 4 6
1032 887 2 3
1033 887 2 3
1034 886 1 2
1035 885 1 1', header = TRUE)
The Count
col shows the total number of ID
values per each Day
and the Count_group
col shows the sum of the ID
values per each Day
and Day - 1
.
e.g. 1933 = Count_group
11 because Count
6 (1933) + Count
5 (1932), and so on.
What I need to do is to create duplicated observations per each Count_group
and add them to it in order to show per each Count_group
its Day
AND Day - 1
.
e.g. Count_group
= 11 is composed by the Count
values of Day
1933 and 1932. So both days needs to be included in the Count_group
= 11.
The next one will be Count_group = 8, composed by 1932 and 1931, etc...
Expected output:
ID Day Count Count_group
1001 1933 6 11
1002 1933 6 11
1003 1933 6 11
1004 1933 6 11
1005 1933 6 11
1006 1933 6 11
1007 1932 5 11
1008 1932 5 11
1009 1932 5 11
1010 1932 5 11
1011 1932 5 11
1007 1932 5 8
1008 1932 5 8
1009 1932 5 8
1010 1932 5 8
1011 1932 5 8
1012 1931 3 8
1013 1931 3 8
1014 1931 3 8
1012 1931 3 4
1013 1931 3 4
1014 1931 3 4
1015 1930 1 4
1015 1930 1 1
1016 1800 6 10
1017 1800 6 10
1018 1800 6 10
1019 1800 6 10
1020 1800 6 10
1021 1800 6 10
1022 1799 4 10
1023 1799 4 10
1024 1799 4 10
1025 1799 4 10
1022 1799 4 6
1023 1799 4 6
1024 1799 4 6
1025 1799 4 6
1026 1798 2 6
1027 1798 2 6
1026 1798 2 2
1027 1798 2 2
1028 888 4 6
1029 888 4 6
1030 888 4 6
1031 888 4 6
1032 887 2 6
1033 887 2 6
1032 887 2 3
1033 887 2 3
1034 886 1 3
1034 886 1 2
1035 885 1 2
1035 885 1 1
Do you have any suggestion?
解决方案 I think this does what you need...
#first add a grouping variable
df$daygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]!=df$Day[i-1])))
#split df into a list of data frames, loop through them to add extra rows,
#and bind them back together
df2 <- do.call(rbind,lapply(split(df,df$daygroup),function(x){
n <- nrow(x)
m <- x$Count_group[1] #number of rows needed for Day
if(m>n){
y <- rbind(x,data.frame(ID=(x$ID[n]+1):(x$ID[n]+m-n), #continue numbering
Day=x$Day[1]-1, #previous day
Count=m-x$Count[1], #difference in count
Count_group=m,
daygroup=x$daygroup[1]))
} else {
y <- x #no extra rows needed
}
return(y)
}
))
df2$daygroup <- NULL #remove grouping variable
head(df2,20) #ignore the rownames!
ID Day Count Count_group
0.1 1001 1933 6 11
0.2 1002 1933 6 11
0.3 1003 1933 6 11
0.4 1004 1933 6 11
0.5 1005 1933 6 11
0.6 1006 1933 6 11
0.7 1007 1932 5 11
0.8 1008 1932 5 11
0.9 1009 1932 5 11
0.10 1010 1932 5 11
0.11 1011 1932 5 11
1.7 1007 1932 5 8
1.8 1008 1932 5 8
1.9 1009 1932 5 8
1.10 1010 1932 5 8
1.11 1011 1932 5 8
1.1 1012 1931 3 8
1.2 1013 1931 3 8
1.3 1014 1931 3 8
2.12 1012 1931 3 4
这篇关于通过创建基于组条件的重复来展开data.frame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!