通过基于组条件创建重复项来展开data.frame(2) [英] Expand data.frame by creating duplicates based on group condition (2)
问题描述
从@AndrewGustar开始回答/代码:展开data.frame通过基于组条件创建重复项
1)
如果我有输入data.frame与 ID
值不顺序,也可以自己复制?
示例data.frame:
df = read.table(text ='ID Day Count Count_group
18 1933 6 11
33 1933 6 11
37 1933 6 11
18 1933 6 11
16 1933 6 11
11 1933 6 11
111 1932 5 8
34 1932 5 8
60 1932 5 8
88 1932 5 8
18 1932 5 8
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 1
6 1800 6 10
37 1800 6 10
98 1800 6 10
52 1800 6 10
18 1800 6 10
76 1800 6 1 0
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 2
677 1798 2 2
778 888 4 6
111 888 4 6
88 888 4 6
10 888 4 6
37 887 2 3
26 887 2 3
8 886 1 2
56 885 1 1',header = TRUE)
code>计数 col显示每个日的总数
并且 ID
值 Count_group
col显示每个 Day $ c的
ID
值的总和$ c>和 Day - 1
。
eg 1933 = Count_group
11,因为计数
6(1933)+ 计数
5(1932),等等。
我需要做的是每个 Count_group
并将它们添加到它,以便显示每个 Count_group
其日
AND Day - 1
。
eg Count_group
= 11由计数
的值 Day
1933年和1932年。所以两天都需要包含在$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 8,由1932年和1931年组成,等等。
期望的输出:
ID日数Count_group
18 1933 6 11
33 1933 6 11
37 1933 6 11
18 1933 6 11
16 1933 6 11
11 1933 6 11
111 1932 5 11
34 1932 5 11
60 1932 5 11
88 1932 5 11
18 1932 5 11
111 1932 5 8
34 1932 5 8
60 1932 5 8
88 1932 5 8
18 1932 5 8
33 1931 3 8
13 1931 3 8
56 1931 3 8
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 4
23 1930 1 1
6 1800 6 10
37 1800 6 10
98 1800 6 10
52 1800 6 10
18 1800 6 10
76 1800 6 10
55 1799 4 10
6 1799 4 10
52 1799 4 10
133 1799 4 10
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 6
677 1798 2 6
112 1798 2 2
677 1798 2 2
778 888 4 6
111 888 4 6
88 888 4 6
10 888 4 6
37 887 2 6
26 887 2 6
37 887 2 3
26 887 2 3
8 886 1 3
8 886 1 2
56 885 1 2
56 885 1 1
这是一个保持上述ID值的解决方案。
#first add分组变量
/ pre>
df $ smalldaygroup< - c(0,cumsum(sapply(2:nrow(df),function(i)df $ Day [i]!= df $ Day [i-1])) #individual days
df $ bigdaygroup< - c(0,cumsum(sapply(2:nrow(df),function(i)df $ Day [i]< df $ Day [i-1] -1 )))连续天数的#block
#duplicate除了每个大组中的第一个以外的个别日子
df2< - lapply(split(df,df $ bigdaygroup),function(x)
split(x,x $ smalldaygroup)[c(1,rep(2:length(split(x,x $ smalldaygroup)),each = 2))])
#change count_group到备用条目中的前一个值
df2 < - lapply(df2,function(L)lapply(1:length(L),function(i){
x <-L [[i] ]
如果(!(i %% 2))x $ Count_group< - L [[i-1]] $ Count_group [1]
return(x)
}))
df2< - do.call(rbind,unlist(df2,recursive = FALSE))#bind回来
头(df2,20)#ignore rownames!
ID日数Count_group
01.1 18 1933 6 11
01.2 33 1933 6 11
01.3 37 1933 6 11
01.4 18 1933 6 11
01.5 16 1933 6 11
01.6 11 1933 6 11
02.7 111 1932 5 11
02.8 34 1932 5 11
02.9 60 1932 5 11
02.10 88 1932 5 11
02.11 18 1932 5 11
03.7 111 1932 5 8
03.8 34 1932 5 8
03.9 60 1932 5 8
03.10 88 1932 5 8
03.11 18 1932 5 8
04.12 33 1931 3 8
04.13 13 1931 3 8
04.14 56 1931 3 8
05.12 33 1931 3 4
Starting from @AndrewGustar answer/code: Expand data.frame by creating duplicates based on group condition
1) What about if I have the input data.frame with
ID
values not in sequence and that can also duplicate theirselves?Example data.frame:
df = read.table(text = 'ID Day Count Count_group 18 1933 6 11 33 1933 6 11 37 1933 6 11 18 1933 6 11 16 1933 6 11 11 1933 6 11 111 1932 5 8 34 1932 5 8 60 1932 5 8 88 1932 5 8 18 1932 5 8 33 1931 3 4 13 1931 3 4 56 1931 3 4 23 1930 1 1 6 1800 6 10 37 1800 6 10 98 1800 6 10 52 1800 6 10 18 1800 6 10 76 1800 6 10 55 1799 4 6 6 1799 4 6 52 1799 4 6 133 1799 4 6 112 1798 2 2 677 1798 2 2 778 888 4 6 111 888 4 6 88 888 4 6 10 888 4 6 37 887 2 3 26 887 2 3 8 886 1 2 56 885 1 1', header = TRUE)
The
Count
col shows the total number ofID
values per eachDay
and theCount_group
col shows the sum of theID
values per eachDay
andDay - 1
.e.g. 1933 =
Count_group
11 becauseCount
6 (1933) +Count
5 (1932), and so on.What I need to do is to create duplicated observations per each
Count_group
and add them to it in order to show per eachCount_group
itsDay
ANDDay - 1
.e.g.
Count_group
= 11 is composed by theCount
values ofDay
1933 and 1932. So both days needs to be included in theCount_group
= 11. The next one will beCount_group
= 8, composed by 1932 and 1931, etc...Desired output:
ID Day Count Count_group 18 1933 6 11 33 1933 6 11 37 1933 6 11 18 1933 6 11 16 1933 6 11 11 1933 6 11 111 1932 5 11 34 1932 5 11 60 1932 5 11 88 1932 5 11 18 1932 5 11 111 1932 5 8 34 1932 5 8 60 1932 5 8 88 1932 5 8 18 1932 5 8 33 1931 3 8 13 1931 3 8 56 1931 3 8 33 1931 3 4 13 1931 3 4 56 1931 3 4 23 1930 1 4 23 1930 1 1 6 1800 6 10 37 1800 6 10 98 1800 6 10 52 1800 6 10 18 1800 6 10 76 1800 6 10 55 1799 4 10 6 1799 4 10 52 1799 4 10 133 1799 4 10 55 1799 4 6 6 1799 4 6 52 1799 4 6 133 1799 4 6 112 1798 2 6 677 1798 2 6 112 1798 2 2 677 1798 2 2 778 888 4 6 111 888 4 6 88 888 4 6 10 888 4 6 37 887 2 6 26 887 2 6 37 887 2 3 26 887 2 3 8 886 1 3 8 886 1 2 56 885 1 2 56 885 1 1
解决方案Here is a solution that keeps the ID values as above.
#first add grouping variables df$smalldaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]!=df$Day[i-1]))) #individual days df$bigdaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]<df$Day[i-1]-1))) #blocks of consecutive days #duplicate individual days except the first in each big group df2 <- lapply(split(df,df$bigdaygroup),function(x) split(x,x$smalldaygroup)[c(1,rep(2:length(split(x,x$smalldaygroup)),each=2))]) #change the Count_group to previous value in alternate entries df2 <- lapply(df2,function(L) lapply(1:length(L),function(i) { x <- L[[i]] if(!(i%%2)) x$Count_group <- L[[i-1]]$Count_group[1] return(x) })) df2 <- do.call(rbind,unlist(df2,recursive=FALSE)) #bind back together head(df2,20) #ignore rownames! ID Day Count Count_group 01.1 18 1933 6 11 01.2 33 1933 6 11 01.3 37 1933 6 11 01.4 18 1933 6 11 01.5 16 1933 6 11 01.6 11 1933 6 11 02.7 111 1932 5 11 02.8 34 1932 5 11 02.9 60 1932 5 11 02.10 88 1932 5 11 02.11 18 1932 5 11 03.7 111 1932 5 8 03.8 34 1932 5 8 03.9 60 1932 5 8 03.10 88 1932 5 8 03.11 18 1932 5 8 04.12 33 1931 3 8 04.13 13 1931 3 8 04.14 56 1931 3 8 05.12 33 1931 3 4
这篇关于通过基于组条件创建重复项来展开data.frame(2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!