使用R在data.frame中有条件地插入新行 [英] conditional insertion of a new row in a data.frame with R

查看:111
本文介绍了使用R在data.frame中有条件地插入新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框:

I have a data frame like this:

ID  Exp1 Exp2 Value1
AAA 5    6    7
AAA 4    8    8
BBB 3    5    9
BBB 6    7    4
CCC 2    5    6
....

我想在每次重复ID后创建一个新行,并对以前的结果求和,如下所示:

and I would like to create a new row after every repetition of an ID,and do summation of previous results, like this:

ID      Exp1 Exp2 Value1
AAA     5    6    7
AAA     4    8    8
AAA.1   9    14   15
BBB     3    5    9
BBB     6    7    4
BBB.1   9    12   13
CCC     2    5    6
...

我的问题是我无法编写代码以在相同的ID之后插入新行.

My problem is I cannot write a code to insert a new row right after same IDs.

    > for (i in 1:nrow(Data)) {   
    > temp1 <- Data[Data$ID == Data$ID[i],]   

但不知道如何进行... 有什么想法吗?

but do not know how to proceed... Any ideas?

更新: 原始数据如何..

Update: how the original data is..

 GeneNames  Original    ID2          Com.   Ratio   Cyt     Nuc
 YWHAB  CL84Contig6     1433B_HUMAN  -0.2   0.6    1063.3   671.3
 YWHAB  CL84Contig4     1433B_HUMAN  -0.3   0.5    59.0     30.5
 YWHAE  CL1665Contig1   1433E_HUMAN  -0.3   0.5    2784.6   1490.1
 YWHAE  CL1665Contig4   1433E_HUMAN   0.1   1.2    2.1      4.8
 YWHAH  dsrrswapns      1433F_HUMAN   0.0   0.0    0.0      0.0
 YWHAG  CL2762Contig2   1433G_HUMAN  -0.3   0.4    39.5     17.7
 YWHAG  CL2762Contig3   1433G_HUMAN   0.0   0.0    0.0      0.0

我想怎么做...

GeneNames   Original    ID2          Com.   Ratio   Cyt     Nuc
 YWHAB  CL84Contig6     1433B_HUMAN  -0.2   0.6    1063.3   671.3
 YWHAB  CL84Contig4     1433B_HUMAN  -0.3   0.5    59.0     30.5
YWHAB.1 CL84Contig6     1433B_HUMAN  -0.2   0.6    1122.4   701.8
 YWHAE  CL1665Contig1   1433E_HUMAN  -0.3   0.5    2784.6   1490.1
 YWHAE  CL1665Contig4   1433E_HUMAN   0.1   1.2    2.1      4.8
YWHAE.1 CL1665Contig1   1433E_HUMAN  -0.3   0.5    2786.6   1494.9

我有一个data.frame:13044磅.共94个变量:这94个变量是num和chr列. 我只想总结来自相同GeneNames的Cyt和Nuc的值,并将它们写到GeneName命名为"GeneName.1"的新行中.每个GeneName的其余列都不相同.我希望保留它们为空,或者复制同一GeneName的第一列,如示例所示.

I have a data.frame: 13044 obs. of 94 variables: these 94 variables are num and chr columns.. I would like to sum up values only from Cyt and Nuc from same GeneNames, and write them into new row where GeneName is named "GeneName.1". Rest of the columns are not same for each GeneName. I would prefer to leave them either empty or copy the first column of the same GeneName, as in the example..

推荐答案

您可以使用data.table进行此操作.将"data.frame"转换为"data.table"(setDT).创建一个按"ID"分组的"NA"行(.SD[1:(.N+1)]),将每个"ID"的"NA"元素替换为sum(lapply(.SD,...))

You could do this using data.table. Convert the "data.frame" to "data.table" (setDT). Create an "NA" row (.SD[1:(.N+1)]) grouped by "ID", replace the "NA" elements for each "ID" by the sum (lapply(.SD,...))

library(data.table)
setDT(df1)[, .SD[1:(.N+1)], ID][, lapply(.SD, function(x)
        replace(x, is.na(x), sum(x, na.rm=TRUE))) , ID]
#      ID Exp1 Exp2 Value1
#1: AAA    5    6      7
#2: AAA    4    8      8
#3: AAA    9   14     15
#4: BBB    3    5      9
#5: BBB    6    7      4
#6: BBB    9   12     13
#7: CCC    2    5      6
#8: CCC    2    5      6

或者您也可以rbind"ID"组中具有"sum"的列.按"ID"排序

Or you can rbind the columns with the "sum" by "ID" group. This gets ordered by "ID"

 setDT(df1)[, rbind(.SD,lapply(.SD, sum)), ID]
 #    ID Exp1 Exp2 Value1
 #1: AAA    5    6      7
 #2: AAA    4    8      8
 #3: AAA    9   14     15
 #4: BBB    3    5      9
 #5: BBB    6    7      4
 #6: BBB    9   12     13
 #7: CCC    2    5      6
 #8: CCC    2    5      6

更新

根据新数据集,尝试

Update

Based on the new dataset, try

  DT1 <- setDT(df1)[, .SD[1:(.N+1)], GeneNames][, 6:7 := lapply(.SD, 
       function(x) replace(x, is.na(x), sum(x, na.rm=TRUE))), 
             GeneNames, .SDcols=6:7]
  DT1[, 2:5 := lapply(.SD, function(x) replace(x, is.na(x),
             x[1L])), GeneNames, .SDcols=2:5][]
  #   GeneNames      Original         ID2 Com. Ratio    Cyt    Nuc
  #1:     YWHAB   CL84Contig6 1433B_HUMAN -0.2   0.6 1063.3  671.3
  #2:     YWHAB   CL84Contig4 1433B_HUMAN -0.3   0.5   59.0   30.5
  #3:     YWHAB   CL84Contig6 1433B_HUMAN -0.2   0.6 1122.3  701.8
  #4:     YWHAE CL1665Contig1 1433E_HUMAN -0.3   0.5 2784.6 1490.1
  #5:     YWHAE CL1665Contig4 1433E_HUMAN  0.1   1.2    2.1    4.8
  #6:     YWHAE CL1665Contig1 1433E_HUMAN -0.3   0.5 2786.7 1494.9
  #7:     YWHAH    dsrrswapns 1433F_HUMAN  0.0   0.0    0.0    0.0
  #8:     YWHAH    dsrrswapns 1433F_HUMAN  0.0   0.0    0.0    0.0
  #9:     YWHAG CL2762Contig2 1433G_HUMAN -0.3   0.4   39.5   17.7
  #10:     YWHAG CL2762Contig3 1433G_HUMAN  0.0   0.0    0.0    0.0
  #11:     YWHAG CL2762Contig2 1433G_HUMAN -0.3   0.4   39.5   17.7

或使用rbind方法

 DT1 <- setDT(df1)[, rbind(.SD, lapply(.SD, sum)), GeneNames, .SDcols=6:7]
 setkey(df2, GeneNames, Cyt, Nuc)[DT1]

然后像以前一样将列2:5中的NA更改为第一行值

and then change the NAs in column 2:5 to first row value as before

 df1 <- structure(list(ID = c("AAA", "AAA", "BBB", "BBB", "CCC"), 
 Exp1 = c(5L, 4L, 3L, 6L, 2L), Exp2 = c(6L, 8L, 5L, 7L, 5L), Value1 = 
 c(7L, 8L, 9L, 4L, 6L)), .Names = c("ID", "Exp1", "Exp2", "Value1"), 
 class = "data.frame", row.names = c(NA, -5L))

这篇关于使用R在data.frame中有条件地插入新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆