通过创建基于组条件的重复来展开data.frame [英] Expand data.frame by creating duplicates based on group condition

查看：120 发布时间：2017/7/21 19:06:13 r dataframe duplicates grouping add

本文介绍了通过创建基于组条件的重复来展开data.frame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这里是我的data.frame的例子：

  df = read.table（text ='ID Day Count Count_group 
 1001 1933 6 11 
 1002 1933 6 11 
 1003 1933 6 11 
 1004 1933 6 11 
 1005 1933 6 11 
 1006 1933 6 11 
 1007 1932 5 8 
 1008 1932 5 8 
 1009 1932 5 8 
 1010 1932 5 8 
 1011 1932 5 8 
 1012 1931 3 4 
 1013 1931 3 4 
 1014 1931 3 4 
 1015 1930 1 1 
 1016 1800 6 10 
 1017 1800 6 10 
 1018 1800 6 10 
 1019 1800 6 10 
 1020 1800 6 10 
 1021 1800 6 10 
 1022 1799 4 6 
 1023 1799 4 6 
 1024 1799 4 6 
 1025 1799 4 6 
 1026 1798 2 2 
 1027 1798 2 2 
 1028 888 4 6 
 1029 888 4 6 
 1030 888 4 6 
 1031 888 4 6 
 1032 887 2 3 
 103 3 887 2 3 
 1034 886 1 2 
 1035 885 1 1'，header = TRUE）

计数 col显示每个日的总数 ID 和 Count_group col显示每个<$ c $的 ID 值的总和c> Day 和 Day - 1 。

eg 1933 = Count_group 11，因为计数 6（1933）+ 计数 5（1932），等等。

我需要做的是每个 Count_group 并将它们添加到它，以便显示每个 Count_group 其日 AND Day - 1 。

eg Count_group = 11由计数的值 Day 1933年和1932年。所以两天都需要包含在$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ ...

预期输出：

  ID日数Count_group 
 1001 1933 6 11 
 1002 1933 6 11 
 1003 1933 6 11 
 1004 1933 6 11 
 1005 1933 6 11 
 1006 1933 6 11 
 1007 1932 5 11 
 1008 1932 5 11 
 1009 1932 5 11 
 1010 1932 5 11 
 1011 1932 5 11 
 1007 1932 5 8 
 1008 1932 5 8 
 1009 1932 5 8 
 1010 1932 5 8 
 1011 1932 5 8 
 1012 1931 3 8 
 1013 1931 3 8 
 1014 1931 3 8 
 1012 1931 3 4 
 1013 1931 3 4 
 1014 1931 3 4 
 1015 1930 1 4 
 1015 19 30 1 1 
 1016 1800 6 10 
 1017 1800 6 10 
 1018 1800 6 10 
 1019 1800 6 10 
 1020 1800 6 10 
 1021 1800 6 10 
 1022 1799 4 10 
 1023 1799 4 10 
 1024 1799 4 10 
 1025 1799 4 10 
 1022 1799 4 6 
 1023 1799 4 6 
 1024 1799 4 6 
 1025 1799 4 6 
 1026 1798 2 6 
 1027 1798 2 6 
 1026 1798 2 2 
 1027 1798 2 2 
 1028 888 4 6 
 1029 888 4 6 
 1030 888 4 6 
 1031 888 4 6 
 1032 887 2 6 
 1033 887 2 6 
 1032 887 2 3 
 1033 887 2 3 
 1034 886 1 3 
 1034 886 1 2 
 1035 885 1 2 
 1035 885 1 1 
 解决方案
我认为这样做你需要... 
  #first添加分组变量
 df $ daygroup<  -  c（0，cumsum（sapply（2：nrow（df）），function（i）df $ Day [i]！ = df $ Day [i-1]）））
 
 #split df到数据框的列表中，循环通过它们添加额外的行，
＃并将它们绑在一起
 df2 < -  do.call（rbind，lapply（split（df，df $ daygroup），function（x）{
n < -  nrow（x）
m < -  x $ Count_group [ 1]＃day 
所需的行数n if（m> n）{
y < -  rbind（x，data.frame（ID =（x $ ID [n] +1））:( x $ ID [n] + mn），#continue numbering 
 Day = x $ Day [1] -1，#previous day 
 Count = mx $ Count [1]，#difference in count 
 Count_group = m，
 daygroup = x $ daygroup [1]））
} else {
y<  -  x #no需要额外的行
} 
 return y）
} 
））
 df2 $ daygroup<  -  NULL #remove分组变量
 
 
头（df2,20）#ignore rownames ！ 
 ID日数Count_group 
 0.1 1001 1933 6 11 
 0.2 1002 1933 6 11 
 0.3 1003 1933 6 11 
 0.4 1004 1933 6 11 
 0.5 1005 1933 6 11 
 0.6 1006 1933 6 11 
 0.7 1007 1932 5 11 
 0.8 1008 1932 5 11 
 0.9 1009 1932 5 11 
 0.10 1010 1932 5 11 
 0.11 1011 1932 5 11 
 1.7 1007 1932 5 8 
 1.8 1008 1932 5 8 
 1.9 1009 1932 5 8 
 1.10 1010 1932 5 8 
 1.11 1011 1932 5 8 
 1.1 1012 1931 3 8 
 1.2 1013 1931 3 8 
 1.3 1014 1931 3 8 
 2.12 1012 1931 3 4 
  / pre> 
Here an example of my data.frame:
df = read.table(text = 'ID  Day Count Count_group
1001    1933    6   11
1002    1933    6   11
1003    1933    6   11
1004    1933    6   11
1005    1933    6   11
1006    1933    6   11
1007    1932    5   8
1008    1932    5   8
1009    1932    5   8
1010    1932    5   8
1011    1932    5   8
1012    1931    3   4
1013    1931    3   4
1014    1931    3   4
1015    1930    1   1
1016    1800    6   10
1017    1800    6   10
1018    1800    6   10
1019    1800    6   10
1020    1800    6   10
1021    1800    6   10
1022    1799    4   6
1023    1799    4   6
1024    1799    4   6
1025    1799    4   6
1026    1798    2   2
1027    1798    2   2
1028    888     4   6
1029    888     4   6
1030    888     4   6
1031    888     4   6
1032    887     2   3
1033    887     2   3
1034    886     1   2
1035    885     1   1', header = TRUE)
The Count col shows the total number of ID values per each Day and the Count_group col shows the sum of the ID values per each Day and Day - 1.

e.g. 1933 = Count_group 11 because Count 6 (1933) + Count 5 (1932), and so on.

What I need to do is to create duplicated observations per each Count_group and add them to it in order to show per each Count_group its Day AND Day - 1.

e.g. Count_group = 11 is composed by the Count values of Day 1933 and 1932. So both days needs to be included in the Count_group = 11.
The next one will be Count_group = 8, composed by 1932 and 1931, etc...

Expected output:
ID      Day  Count  Count_group
1001    1933    6   11
1002    1933    6   11
1003    1933    6   11
1004    1933    6   11
1005    1933    6   11
1006    1933    6   11
1007    1932    5   11
1008    1932    5   11
1009    1932    5   11
1010    1932    5   11
1011    1932    5   11
1007    1932    5   8
1008    1932    5   8
1009    1932    5   8
1010    1932    5   8
1011    1932    5   8
1012    1931    3   8
1013    1931    3   8
1014    1931    3   8
1012    1931    3   4
1013    1931    3   4
1014    1931    3   4
1015    1930    1   4
1015    1930    1   1
1016    1800    6   10
1017    1800    6   10
1018    1800    6   10
1019    1800    6   10
1020    1800    6   10
1021    1800    6   10
1022    1799    4   10
1023    1799    4   10
1024    1799    4   10
1025    1799    4   10
1022    1799    4   6
1023    1799    4   6
1024    1799    4   6
1025    1799    4   6
1026    1798    2   6
1027    1798    2   6
1026    1798    2   2
1027    1798    2   2
1028    888    4    6
1029    888    4    6
1030    888    4    6
1031    888    4    6
1032    887    2    6
1033    887    2    6
1032    887    2    3
1033    887    2    3
1034    886    1    3
1034    886    1    2
1035    885    1    2
1035    885    1    1
Do you have any suggestion?
 解决方案 
I think this does what you need...
#first add a grouping variable
df$daygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]!=df$Day[i-1])))

#split df into a list of data frames, loop through them to add extra rows, 
#and bind them back together
df2 <- do.call(rbind,lapply(split(df,df$daygroup),function(x){ 
  n <- nrow(x)
  m <- x$Count_group[1] #number of rows needed for Day
  if(m>n){
    y <- rbind(x,data.frame(ID=(x$ID[n]+1):(x$ID[n]+m-n), #continue numbering
                            Day=x$Day[1]-1, #previous day
                            Count=m-x$Count[1], #difference in count
                            Count_group=m,
                            daygroup=x$daygroup[1]))
  } else {
    y <- x #no extra rows needed
  }
  return(y)
}
))
df2$daygroup <- NULL #remove grouping variable


head(df2,20) #ignore the rownames!
       ID  Day Count Count_group
0.1  1001 1933     6          11
0.2  1002 1933     6          11
0.3  1003 1933     6          11
0.4  1004 1933     6          11
0.5  1005 1933     6          11
0.6  1006 1933     6          11
0.7  1007 1932     5          11
0.8  1008 1932     5          11
0.9  1009 1932     5          11
0.10 1010 1932     5          11
0.11 1011 1932     5          11
1.7  1007 1932     5           8
1.8  1008 1932     5           8
1.9  1009 1932     5           8
1.10 1010 1932     5           8
1.11 1011 1932     5           8
1.1  1012 1931     3           8
1.2  1013 1931     3           8
1.3  1014 1931     3           8
2.12 1012 1931     3           4


                        
这篇关于通过创建基于组条件的重复来展开data.frame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

通过创建基于组条件的重复来展开data.frame [英] Expand data.frame by creating duplicates based on group condition

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过创建基于组条件的重复来展开data.frame [英] Expand data.frame by creating duplicates based on group condition

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭