循环在特定列上匹配模式(在数据框中)上跨行执行计算? [英] Loop to perform calculations across rows on specific columns matching a pattern (in data frame)?

查看:151
本文介绍了循环在特定列上匹配模式(在数据框中)上跨行执行计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框与一些布尔值(1/0)如下(抱歉,我不知道如何把它做成一个聪明的表)

  Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted 
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0

我有64个样本(Sam / Ted ....等),它们在一个名为files的列表中;

<$ p $ files <-c(Sam,Ted,Ann,...)

我想创建一个列,将每个样本的标志值相加来创建以下内容:

  Sam Ted 
probe1.flagsum 1 1
probe2.flagsum 0 0
probe3.flagsum 1 0
probe4.flagsum 0 0
probe5.flagsum 2 1

我对R相当陌生,试图学习需要知道的基础,但我已经尝试了以下内容:

  $ {
$ FLAGS $ i < - cbind(sapply(i,function(y){
#greping列过滤一个样本
filter1< ; - grep(names(filters),pattern = y)
#列出这些列的总和值
FLAGS $ y <-rowSums(filters [,(filter1)])
}
}

上面的代码不起作用,前面。

任何人都可以帮我解决这个问题,或者指点我使用的命令/工具的正确方向。

谢谢。

解决方案

这很容易在R reshape ,尽管使用重塑 reshape2 packa ges可能更直观。

这是一个基于R的解决方案:

 #这里是你的数据目前的形式
dat = read.table(header = TRUE,text =Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0 )
#生成一个ID行
dat $ id = row.names(dat)
#重新整形为长
r.dat = reshape(dat,direction =long ,
timevar =probe,
vary = 1:6,sep =。)
#计算行数
r.dat $ sum = rowSums(r。 dat [3:5])
#重塑成宽格式,放弃你不感兴趣的
reshape(r.dat,direction =wide,
idvar =id ,timevar =probe,
drop = 3:5)
## id sum.Sam sum.Ted
## probe1.Sam probe1 1 1
## probe2 .Sam probe2 0 0
## probe3.Sam probe3 1 0
## probe4.Sam probe4 0 0
## probe5.Sam probe5 2 1



不止一种方法可以让猫变成皮肤



你也可以调用这样一个函数:

  myFun = function(data,varnames){
temp = vector(list,length(varnames))
for(i in 1:length(varnames)){
temp [[i]] = colSums(t(dat [grep(varnames [i],names(data))]))
names(temp)[[i]] = varnames [i]
}
data.frame(temp)
}
$ b $ p然后,利用你有名字的向量:

  files = c(Sam,Ted)
myFun(dat,files)
## Sam ted
## probe1 1 1
## probe2 0 0
## probe3 1 0
## probe4 0 0
## probe5 2 1

享受!

I have a dataframe with some boolean values (1/0) as follows (sorry I couldn't work out how to make this into a smart table)

       Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1         0         1         0         1         0         0
probe2         0         0         0         0         0         0
probe3         1         0         0         0         0         0
probe4         0         0         0         0         0         0
probe5         1         1         0         1         0         0

I have 64 samples (Sam/Ted....etc) which are in a list called files i.e;

files <- c("Sam", "Ted", "Ann", ....) 

And I would like to create a a column summing the flag values for each sample to create the following:

               Sam Ted 
probe1.flagsum   1   1
probe2.flagsum   0   0 
probe3.flagsum   1   0 
probe4.flagsum   0   0
probe5.flagsum   2   1

I am fairly new to R, trying to learn on a need to know basis but I have tried the following:

for(i in files) {
    FLAGS$i <- cbind(sapply(i, function(y) { 
        #greping columns to filter for one sample
        filter1 <- grep(names(filters), pattern=y)
        #print out the summed values for those columns  
        FLAGS$y <-rowSums(filters[,(filter1)])
    }
}

The above code does not work and I am bit lost as how to move forward.

Can anyone help me untangle this problem or point me in the right direction of the commands/tools to use.

Thank you.

解决方案

This is easily doable in base R reshape, though using the reshape or reshape2 packages might be more intuitive.

Here's a solution in base R:

# Here's your data in its current form
dat = read.table(header=TRUE, text="Flag1.Sam Flag2.Sam   Flag3.Sam   Flag1.Ted   Flag2.Ted   Flag3.Ted
probe1 0   1   0   1   0   0
probe2 0   0   0   0   0   0
probe3 1   0   0   0   0   0
probe4 0   0   0   0   0   0
probe5 1   1   0   1   0   0")
# Generate an ID row
dat$id = row.names(dat)
# Reshape wide to long
r.dat = reshape(dat, direction="long", 
                timevar="probe", 
                varying=1:6, sep=".")
# Calculate row sums
r.dat$sum = rowSums(r.dat[3:5])
# Reshape back to wide format, dropping what you're not interested in
reshape(r.dat, direction="wide", 
        idvar="id", timevar="probe", 
        drop=3:5)
##                id sum.Sam sum.Ted
## probe1.Sam probe1       1       1
## probe2.Sam probe2       0       0
## probe3.Sam probe3       1       0
## probe4.Sam probe4       0       0
## probe5.Sam probe5       2       1

More than one way to skin a cat

You can also whip up a function like this one:

myFun = function(data, varnames) {
  temp = vector("list", length(varnames))
  for (i in 1:length(varnames)) {
    temp[[i]] = colSums(t(dat[grep(varnames[i], names(data))]))
    names(temp)[[i]] = varnames[i]
  }
  data.frame(temp)
}

Then, making use of the vector that you have of names:

files = c("Sam", "Ted")
myFun(dat, files)
##        Sam Ted
## probe1   1   1
## probe2   0   0
## probe3   1   0
## probe4   0   0
## probe5   2   1

Enjoy!

这篇关于循环在特定列上匹配模式(在数据框中)上跨行执行计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆