R-如何在另一列的条件定义的间隔之间求和一列中的对象 [英] R - How to sum objects in a column between an interval defined by conditions on another column

查看:139
本文介绍了R-如何在另一列的条件定义的间隔之间求和一列中的对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是对此问题的一种应用:

This comes as an application to this question:Sum object in a column between an interval defined by another column

对于((A[i+1]-A[i]==0)(A[i+1]-A[i]==1)(A[i]-A[i-1]==0)(A[i]-A[i-1]==1))(其中i是行索引),如果我想对B中的值求和,我想知道如何调整答案.所以基本上将A的B行总和等于+/- 1,但同一行两次不求和?

What I would like to know is how to adjust the answer if I want to sum the values in B, for ((A[i+1]-A[i]==0) or (A[i+1]-A[i]==1) or (A[i]-A[i-1]==0) or (A[i]-A[i-1]==1)) where i is the row index, so basically sum B rows for A-s that have the same value +/- 1, but not sum the same row twice?

我尝试构建循环函数,但在将行索引与数据帧一起使用时卡住了. 例子: 如果给出以下数据帧

I tried building a loop function but I get stuck when using row indices with data frames. Example: If the following data frame is given

df     
      A B
[1,]  1 4
[2,]  1 3
[3,]  3 5
[4,]  3 7
[5,]  4 3
[6,]  5 2

我要获取的是下一个数据帧:

What I want to obtain is the next data frame:

df
      A B
[1,]  1 7
[2,]  3 15
[3,]  5 2

此外,如果一个具有较大的数据帧,例如:

Moreover if a have a large data frame like this:

df
chr     start           stop            m       n       s
chr1    71533361        71533362        23      1       -
chr1    71533361        71533362        24      26      -
chr1    71533361        71533362        25      1       -

,我希望我的结果看起来像这样(我选择了m列中的值为max的行):

and I want my result to look like this (I chose the row for which the value in column m is max):

df
chr1    71533361        71533362        24      28      -

推荐答案

假设您的原始数据框为df,请尝试以下操作:

Try the following, assuming your original dataframe is df:

df2 <- df # create a duplicate df to destroy
z <- data.frame(nrow=length(unique(df$A)), ncol=2) # output dataframe
names(z) <- c("A","B")
j <- 1 # output indexing variable
u <- unique(df$A) # unique vals of A
i <- u[1]
s <- TRUE # just for the while() loop
while(s){
    z[j,] <- c(i,sum(df2[df2$A %in% c(i-1,i,i+1),2]))
    df2 <- df2[!df2$A %in% c(i-1,i,i+1),]
    j <- j + 1 # index the output
    u <- u[!u %in% c(i-1,i,i+1)] # cleanup the u vector
    if(length(u)==0) # conditionally exit the loop
        s <- FALSE
    else
        i <- min(u) # reset value to sum by
}

我知道那是一堆凌乱的代码,但是考虑到所有不同的索引,这是一个棘手的问题.

I know that's kind of messy code, but it's a sort of tough problem given all of the different indices.

这篇关于R-如何在另一列的条件定义的间隔之间求和一列中的对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆