循环在特定列上匹配模式(在数据框中)上跨行执行计算? [英] Loop to perform calculations across rows on specific columns matching a pattern (in data frame)?
问题描述
我有一个数据框与一些布尔值(1/0)如下(抱歉,我不知道如何把它做成一个聪明的表)
Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0
我有64个样本(Sam / Ted ....等),它们在一个名为files的列表中;
<$ p $ files <-c(Sam,Ted,Ann,...)
我想创建一个列,将每个样本的标志值相加来创建以下内容:
Sam Ted
probe1.flagsum 1 1
probe2.flagsum 0 0
probe3.flagsum 1 0
probe4.flagsum 0 0
probe5.flagsum 2 1
我对R相当陌生,试图学习需要知道的基础,但我已经尝试了以下内容:
$ {
$ FLAGS $ i < - cbind(sapply(i,function(y){
#greping列过滤一个样本
filter1< ; - grep(names(filters),pattern = y)
#列出这些列的总和值
FLAGS $ y <-rowSums(filters [,(filter1)])
}
}
上面的代码不起作用,前面。
任何人都可以帮我解决这个问题,或者指点我使用的命令/工具的正确方向。
谢谢。
这很容易在R reshape
,尽管使用重塑
或 reshape2
packa ges可能更直观。
这是一个基于R的解决方案:
#这里是你的数据目前的形式
dat = read.table(header = TRUE,text =Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0 )
#生成一个ID行
dat $ id = row.names(dat)
#重新整形为长
r.dat = reshape(dat,direction =long ,
timevar =probe,
vary = 1:6,sep =。)
#计算行数
r.dat $ sum = rowSums(r。 dat [3:5])
#重塑成宽格式,放弃你不感兴趣的
reshape(r.dat,direction =wide,
idvar =id ,timevar =probe,
drop = 3:5)
## id sum.Sam sum.Ted
## probe1.Sam probe1 1 1
## probe2 .Sam probe2 0 0
## probe3.Sam probe3 1 0
## probe4.Sam probe4 0 0
## probe5.Sam probe5 2 1
不止一种方法可以让猫变成皮肤
你也可以调用这样一个函数:
myFun = function(data,varnames){
temp = vector(list,length(varnames))
for(i in 1:length(varnames)){
temp [[i]] = colSums(t(dat [grep(varnames [i],names(data))]))
names(temp)[[i]] = varnames [i]
}
data.frame(temp)
}
$ b $ p然后,利用你有名字的向量: files = c(Sam,Ted)
myFun(dat,files)
## Sam ted
## probe1 1 1
## probe2 0 0
## probe3 1 0
## probe4 0 0
## probe5 2 1
享受!
I have a dataframe with some boolean values (1/0) as follows (sorry I couldn't work out how to make this into a smart table)
Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0
I have 64 samples (Sam/Ted....etc) which are in a list called files i.e;
files <- c("Sam", "Ted", "Ann", ....)
And I would like to create a a column summing the flag values for each sample to create the following:
Sam Ted
probe1.flagsum 1 1
probe2.flagsum 0 0
probe3.flagsum 1 0
probe4.flagsum 0 0
probe5.flagsum 2 1
I am fairly new to R, trying to learn on a need to know basis but I have tried the following:
for(i in files) {
FLAGS$i <- cbind(sapply(i, function(y) {
#greping columns to filter for one sample
filter1 <- grep(names(filters), pattern=y)
#print out the summed values for those columns
FLAGS$y <-rowSums(filters[,(filter1)])
}
}
The above code does not work and I am bit lost as how to move forward.
Can anyone help me untangle this problem or point me in the right direction of the commands/tools to use.
Thank you.
This is easily doable in base R reshape
, though using the reshape
or reshape2
packages might be more intuitive.
Here's a solution in base R:
# Here's your data in its current form
dat = read.table(header=TRUE, text="Flag1.Sam Flag2.Sam Flag3.Sam Flag1.Ted Flag2.Ted Flag3.Ted
probe1 0 1 0 1 0 0
probe2 0 0 0 0 0 0
probe3 1 0 0 0 0 0
probe4 0 0 0 0 0 0
probe5 1 1 0 1 0 0")
# Generate an ID row
dat$id = row.names(dat)
# Reshape wide to long
r.dat = reshape(dat, direction="long",
timevar="probe",
varying=1:6, sep=".")
# Calculate row sums
r.dat$sum = rowSums(r.dat[3:5])
# Reshape back to wide format, dropping what you're not interested in
reshape(r.dat, direction="wide",
idvar="id", timevar="probe",
drop=3:5)
## id sum.Sam sum.Ted
## probe1.Sam probe1 1 1
## probe2.Sam probe2 0 0
## probe3.Sam probe3 1 0
## probe4.Sam probe4 0 0
## probe5.Sam probe5 2 1
More than one way to skin a cat
You can also whip up a function like this one:
myFun = function(data, varnames) {
temp = vector("list", length(varnames))
for (i in 1:length(varnames)) {
temp[[i]] = colSums(t(dat[grep(varnames[i], names(data))]))
names(temp)[[i]] = varnames[i]
}
data.frame(temp)
}
Then, making use of the vector that you have of names:
files = c("Sam", "Ted")
myFun(dat, files)
## Sam Ted
## probe1 1 1
## probe2 0 0
## probe3 1 0
## probe4 0 0
## probe5 2 1
Enjoy!
这篇关于循环在特定列上匹配模式(在数据框中)上跨行执行计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!