通过基于组条件创建重复项来展开data.frame（2） [英] Expand data.frame by creating duplicates based on group condition (2)

查看：92 发布时间：2017/7/21 0:14:35 r dataframe duplicates grouping add

本文介绍了通过基于组条件创建重复项来展开data.frame（2）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

从@AndrewGustar开始回答/代码：展开data.frame通过基于组条件创建重复项

1）
如果我有输入data.frame与 ID 值不顺序，也可以自己复制？

示例data.frame：

  df = read.table（text ='ID Day Count Count_group 
 18 1933 6 11 
 33 1933 6 11 
 37 1933 6 11 
 18 1933 6 11 
 16 1933 6 11 
 11 1933 6 11 
 111 1932 5 8 
 34 1932 5 8 
 60 1932 5 8 
 88 1932 5 8 
 18 1932 5 8 
 33 1931 3 4 
 13 1931 3 4 
 56 1931 3 4 
 23 1930 1 1 
 6 1800 6 10 
 37 1800 6 10 
 98 1800 6 10 
 52 1800 6 10 
 18 1800 6 10 
 76 1800 6 1 0 
 55 1799 4 6 
 6 1799 4 6 
 52 1799 4 6 
 133 1799 4 6 
 112 1798 2 2 
 677 1798 2 2 
 778 888 4 6 
 111 888 4 6 
 88 888 4 6 
 10 888 4 6 
 37 887 2 3 
 26 887 2 3 
 8 886 1 2 
 56 885 1 1'，header = TRUE）

code>计数 col显示每个日的总数 ID 值并且 Count_group col显示每个 Day ID 值的总和$ c>和 Day - 1 。

eg 1933 = Count_group 11，因为计数 6（1933）+ 计数 5（1932），等等。

我需要做的是每个 Count_group 并将它们添加到它，以便显示每个 Count_group 其日 AND Day - 1 。

eg Count_group = 11由计数的值 Day 1933年和1932年。所以两天都需要包含在$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 8，由1932年和1931年组成，等等。

期望的输出：

  ID日数Count_group 
 18 1933 6 11 
 33 1933 6 11 
 37 1933 6 11 
 18 1933 6 11 
 16 1933 6 11 
 11 1933 6 11 
 111 1932 5 11 
 34 1932 5 11 
 60 1932 5 11 
 88 1932 5 11 
 18 1932 5 11 
 111 1932 5 8 
 34 1932 5 8 
 60 1932 5 8 
 88 1932 5 8 
 18 1932 5 8 
 33 1931 3 8 
 13 1931 3 8 
 56 1931 3 8 
 33 1931 3 4 
 13 1931 3 4 
 56 1931 3 4 
 23 1930 1 4 
 23 1930 1 1 
 6 1800 6 10 
 37 1800 6 10 
 98 1800 6 10 
 52 1800 6 10 
 18 1800 6 10 
 76 1800 6 10 
 55 1799 4 10 
 6 1799 4 10 
 52 1799 4 10 
 133 1799 4 10 
 55 1799 4 6 
 6 1799 4 6 
 52 1799 4 6 
 133 1799 4 6 
 112 1798 2 6 
 677 1798 2 6 
 112 1798 2 2 
 677 1798 2 2 
 778 888 4 6 
 111 888 4 6 
 88 888 4 6 
 10 888 4 6 
 37 887 2 6 
 26 887 2 6 
 37 887 2 3 
 26 887 2 3 
 8 886 1 3 
 8 886 1 2 
 56 885 1 2 
 56 885 1 1

解决方案

这是一个保持上述ID值的解决方案。

  #first add分组变量
 df $ smalldaygroup<  -  c（0，cumsum（sapply（2：nrow（df），function（i）df $ Day [i]！= df $ Day [i-1]）） #individual days 
 df $ bigdaygroup<  -  c（0，cumsum（sapply（2：nrow（df），function（i）df $ Day [i]< df $ Day [i-1] -1 ）））连续天数的#block 
 
 #duplicate除了每个大组中的第一个以外的个别日子
 df2<  -  lapply（split（df，df $ bigdaygroup），function（x） 
 split（x，x $ smalldaygroup）[c（1，rep（2：length（split（x，x $ smalldaygroup）），each = 2））]）
 
 #change count_group到备用条目中的前一个值
 df2 < -  lapply（df2，function（L）lapply（1：length（L），function（i）{
x <-L [[i] ] 
如果（！（i %% 2））x $ Count_group<  -  L [[i-1]] $ Count_group [1] 
 return（x）
}））
 
 df2<  -  do.call（rbind，unlist（df2，recursive = FALSE））#bind回来
 
头（df2,20）#ignore rownames！ 
 ID日数Count_group 
 01.1 18 1933 6 11 
 01.2 33 1933 6 11 
 01.3 37 1933 6 11 
 01.4 18 1933 6 11 
 01.5 16 1933 6 11 
 01.6 11 1933 6 11 
 02.7 111 1932 5 11 
 02.8 34 1932 5 11 
 02.9 60 1932 5 11 
 02.10 88 1932 5 11 
 02.11 18 1932 5 11 
 03.7 111 1932 5 8 
 03.8 34 1932 5 8 
 03.9 60 1932 5 8 
 03.10 88 1932 5 8 
 03.11 18 1932 5 8 
 04.12 33 1931 3 8 
 04.13 13 1931 3 8 
 04.14 56 1931 3 8 
 05.12 33 1931 3 4 
  / pre> 
Starting from @AndrewGustar answer/code: Expand data.frame by creating duplicates based on group condition

1)
What about if I have the input data.frame with ID values not in sequence and that can also duplicate theirselves?

Example data.frame:
df = read.table(text = 'ID  Day Count   Count_group
18  1933    6   11
33  1933    6   11
37  1933    6   11
18  1933    6   11
16  1933    6   11
11  1933    6   11
111 1932    5   8
34  1932    5   8
60  1932    5   8
88  1932    5   8
18  1932    5   8
33  1931    3   4
13  1931    3   4
56  1931    3   4
23  1930    1   1
6   1800    6   10
37  1800    6   10
98  1800    6   10
52  1800    6   10
18  1800    6   10
76  1800    6   10
55  1799    4   6
6   1799    4   6
52  1799    4   6
133 1799    4   6
112 1798    2   2
677 1798    2   2
778 888     4   6
111 888     4   6
88  888     4   6
10  888     4   6
37  887     2   3
26  887     2   3
8   886     1   2
56  885     1   1', header = TRUE)
The Count col shows the total number of ID values per each Day and the Count_group col shows the sum of the ID values per each Day and Day - 1.

e.g. 1933 = Count_group 11 because Count 6 (1933) + Count 5 (1932), and so on.

What I need to do is to create duplicated observations per each Count_group and add them to it in order to show per each Count_group its Day AND Day - 1.

e.g. Count_group = 11 is composed by the Count values of Day 1933 and 1932. So both days needs to be included in the Count_group = 11. The next one will be Count_group = 8, composed by 1932 and 1931, etc...

Desired output:  
    ID  Day   Count Count_group
    18  1933    6   11
    33  1933    6   11
    37  1933    6   11
    18  1933    6   11
    16  1933    6   11
    11  1933    6   11
    111 1932    5   11
    34  1932    5   11
    60  1932    5   11
    88  1932    5   11
    18  1932    5   11
    111 1932    5   8
    34  1932    5   8
    60  1932    5   8
    88  1932    5   8
    18  1932    5   8
    33  1931    3   8
    13  1931    3   8
    56  1931    3   8
    33  1931    3   4
    13  1931    3   4
    56  1931    3   4
    23  1930    1   4
    23  1930    1   1
    6   1800    6   10
    37  1800    6   10
    98  1800    6   10
    52  1800    6   10
    18  1800    6   10
    76  1800    6   10
    55  1799    4   10
    6   1799    4   10
    52  1799    4   10
    133 1799    4   10
    55  1799    4   6
    6   1799    4   6
    52  1799    4   6
    133 1799    4   6
    112 1798    2   6
    677 1798    2   6
    112 1798    2   2
    677 1798    2   2
    778 888     4   6
    111 888     4   6
    88  888     4   6
    10  888     4   6
    37  887     2   6
    26  887     2   6
    37  887     2   3
    26  887     2   3
    8   886     1   3
    8   886     1   2
    56  885     1   2
    56  885     1   1

 解决方案 
Here is a solution that keeps the ID values as above.
#first add grouping variables
df$smalldaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]!=df$Day[i-1]))) #individual days
df$bigdaygroup <- c(0,cumsum(sapply(2:nrow(df),function(i) df$Day[i]<df$Day[i-1]-1))) #blocks of consecutive days

#duplicate individual days except the first in each big group
df2 <- lapply(split(df,df$bigdaygroup),function(x) 
  split(x,x$smalldaygroup)[c(1,rep(2:length(split(x,x$smalldaygroup)),each=2))])

#change the Count_group to previous value in alternate entries
df2 <- lapply(df2,function(L) lapply(1:length(L),function(i) {
  x <- L[[i]]
  if(!(i%%2)) x$Count_group <- L[[i-1]]$Count_group[1]
  return(x)
}))

df2 <- do.call(rbind,unlist(df2,recursive=FALSE)) #bind back together

head(df2,20) #ignore rownames!
       ID  Day Count Count_group
01.1   18 1933     6          11
01.2   33 1933     6          11
01.3   37 1933     6          11
01.4   18 1933     6          11
01.5   16 1933     6          11
01.6   11 1933     6          11
02.7  111 1932     5          11
02.8   34 1932     5          11
02.9   60 1932     5          11
02.10  88 1932     5          11
02.11  18 1932     5          11
03.7  111 1932     5           8
03.8   34 1932     5           8
03.9   60 1932     5           8
03.10  88 1932     5           8
03.11  18 1932     5           8
04.12  33 1931     3           8
04.13  13 1931     3           8
04.14  56 1931     3           8
05.12  33 1931     3           4


                        
这篇关于通过基于组条件创建重复项来展开data.frame（2）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

通过基于组条件创建重复项来展开data.frame（2） [英] Expand data.frame by creating duplicates based on group condition (2)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过基于组条件创建重复项来展开data.frame（2） [英] Expand data.frame by creating duplicates based on group condition (2)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭