循环在特定列上匹配模式（在数据框中）上跨行执行计算？ [英] Loop to perform calculations across rows on specific columns matching a pattern (in data frame)?

查看：151 发布时间：2018/1/28 13:27:16 r for-loop pattern-matching aggregate reshape

本文介绍了循环在特定列上匹配模式（在数据框中）上跨行执行计算？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框与一些布尔值（1/0）如下（抱歉，我不知道如何把它做成一个聪明的表）

  Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted 
 probe1 0 1 0 1 0 0 
 probe2 0 0 0 0 0 0 
 probe3 1 0 0 0 0 0 
 probe4 0 0 0 0 0 0 
 probe5 1 1 0 1 0 0

我有64个样本（Sam / Ted ....等），它们在一个名为files的列表中;

<$ p $ files <-c（Sam，Ted，Ann，...）

我想创建一个列，将每个样本的标志值相加来创建以下内容：

  Sam Ted 
 probe1.flagsum 1 1 
 probe2.flagsum 0 0 
 probe3.flagsum 1 0 
 probe4.flagsum 0 0 
 probe5.flagsum 2 1

我对R相当陌生，试图学习需要知道的基础，但我已经尝试了以下内容：

  $ {
 $ FLAGS $ i < -  cbind（sapply（i，function（y）{
 #greping列过滤一个样本
 filter1< ;  -  grep（names（filters），pattern = y）
＃列出这些列的总和值
 FLAGS $ y <-rowSums（filters [，（filter1）]）
 } 
}

上面的代码不起作用，前面。

任何人都可以帮我解决这个问题，或者指点我使用的命令/工具的正确方向。

谢谢。

解决方案
这很容易在R reshape ，尽管使用重塑或 reshape2 packa ges可能更直观。

这是一个基于R的解决方案：

＃这里是你的数据目前的形式 dat = read.table（header = TRUE，text =Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted probe1 0 1 0 1 0 0 probe2 0 0 0 0 0 0 probe3 1 0 0 0 0 0 probe4 0 0 0 0 0 0 probe5 1 1 0 1 0 0 ）＃生成一个ID行 dat $ id = row.names（dat）＃重新整形为长 r.dat = reshape（dat，direction =long ， timevar =probe， vary = 1：6，sep =。）＃计算行数 r.dat $ sum = rowSums（r。 dat [3：5]）＃重塑成宽格式，放弃你不感兴趣的 reshape（r.dat，direction =wide， idvar =id ，timevar =probe， drop = 3：5） ## id sum.Sam sum.Ted ## probe1.Sam probe1 1 1 ## probe2 .Sam probe2 0 0 ## probe3.Sam probe3 1 0 ## probe4.Sam probe4 0 0 ## probe5.Sam probe5 2 1

不止一种方法可以让猫变成皮肤

你也可以调用这样一个函数：

myFun = function（data，varnames）{ temp = vector（list，length（varnames）） for（i in 1：length（varnames））{ temp [[i]] = colSums（t（dat [grep（varnames [i]，names（data））]）） names（temp）[[i]] = varnames [i] } data.frame（temp） }
$ b $ p然后，利用你有名字的向量：

files = c（Sam，Ted） myFun（dat，files） ## Sam ted ## probe1 1 1 ## probe2 0 0 ## probe3 1 0 ## probe4 0 0 ## probe5 2 1
享受！
I have a dataframe with some boolean values (1/0) as follows (sorry I couldn't work out how to make this into a smart table)
Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted probe1 0 1 0 1 0 0 probe2 0 0 0 0 0 0 probe3 1 0 0 0 0 0 probe4 0 0 0 0 0 0 probe5 1 1 0 1 0 0
I have 64 samples (Sam/Ted....etc) which are in a list called files i.e;
files <- c("Sam", "Ted", "Ann", ....)
And I would like to create a a column summing the flag values for each sample to create the following:
Sam Ted probe1.flagsum 1 1 probe2.flagsum 0 0 probe3.flagsum 1 0 probe4.flagsum 0 0 probe5.flagsum 2 1
I am fairly new to R, trying to learn on a need to know basis but I have tried the following:
for(i in files) { FLAGS$i <- cbind(sapply(i, function(y) { #greping columns to filter for one sample filter1 <- grep(names(filters), pattern=y) #print out the summed values for those columns FLAGS$y <-rowSums(filters[,(filter1)]) } }
The above code does not work and I am bit lost as how to move forward.

Can anyone help me untangle this problem or point me in the right direction of the commands/tools to use.

Thank you.
解决方案
This is easily doable in base R reshape, though using the reshape or reshape2 packages might be more intuitive.

Here's a solution in base R:
# Here's your data in its current form dat = read.table(header=TRUE, text="Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted probe1 0 1 0 1 0 0 probe2 0 0 0 0 0 0 probe3 1 0 0 0 0 0 probe4 0 0 0 0 0 0 probe5 1 1 0 1 0 0") # Generate an ID row dat$id = row.names(dat) # Reshape wide to long r.dat = reshape(dat, direction="long", timevar="probe", varying=1:6, sep=".") # Calculate row sums r.dat$sum = rowSums(r.dat[3:5]) # Reshape back to wide format, dropping what you're not interested in reshape(r.dat, direction="wide", idvar="id", timevar="probe", drop=3:5) ## id sum.Sam sum.Ted ## probe1.Sam probe1 1 1 ## probe2.Sam probe2 0 0 ## probe3.Sam probe3 1 0 ## probe4.Sam probe4 0 0 ## probe5.Sam probe5 2 1

More than one way to skin a cat

You can also whip up a function like this one:
myFun = function(data, varnames) { temp = vector("list", length(varnames)) for (i in 1:length(varnames)) { temp[[i]] = colSums(t(dat[grep(varnames[i], names(data))])) names(temp)[[i]] = varnames[i] } data.frame(temp) }
Then, making use of the vector that you have of names:
files = c("Sam", "Ted") myFun(dat, files) ## Sam Ted ## probe1 1 1 ## probe2 0 0 ## probe3 1 0 ## probe4 0 0 ## probe5 2 1
Enjoy!

这篇关于循环在特定列上匹配模式（在数据框中）上跨行执行计算？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

循环在特定列上匹配模式（在数据框中）上跨行执行计算？ [英] Loop to perform calculations across rows on specific columns matching a pattern (in data frame)?

问题描述

不止一种方法可以让猫变成皮肤

More than one way to skin a cat

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

循环在特定列上匹配模式（在数据框中）上跨行执行计算？ [英] Loop to perform calculations across rows on specific columns matching a pattern (in data frame)?

问题描述

不止一种方法可以让猫变成皮肤

More than one way to skin a cat

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭