如何为R中的一行中的多个值设置多个条件? [英] How to set multiple conditions for multiple values in a row in R?
问题描述
我要编写的规则如下:
如果一个基因/行的值小于-0.5,且值不大于0.5,则仅保留最大的负值.
如果其值大于0.5而没有小于-0.5的值,则仅保留最大的正值.
如果其值不小于-0.5或大于0.5,则保留最大值.
如果两个值均小于-0.5并且大于0.5,则保持最大值.
例如,我的数据如下:
基因BetaACE 0.01,-0.6、0.4BRCA 0.7,-0.2、0.2ZAP70 0.001,0.02,-0.003P53 0.8,-0.6、0.001
预期输出(根据设置条件选择最大的负值或正值):
基因BetaACE -0.6BRCA 0.7ZAP70 0.02P53 0.8
我来自生物学背景,是R的新手,所以不确定如何编码.目前,我正在使用函数来选择基因的最大β值或最小β值,但是我不知道如何在进一步的条件下对此进行修改:
max2 = function(x)if(all(is.na(x)))NA else max(x,na.rm = T)getmax = function(col)str_extract_all(col,"[0-9 \\ .-] +")%&%;%lapply(.,function(x)max2(as.numeric(x)))%>%unlist()min2 =函数(x)if(all(is.na(x)))NA不存在min(x,na.rm = T)getmin = function(col)str_extract_all(col,"[0-9 \\ .-] +")%&%;%lapply(.,function(x)min2(as.numeric(x)))%>%unlist()测试<-df%&%;%mutate_at(names(df)[2],getmax)
在正确的方向上设置多个条件语句的任何帮助将不胜感激.
示例数据:
dput(df)结构(列表(基因= c("ACE","BRCA","ZAP70","P53"),测试版" = c("0.01,-0.6、0.4","0.7,-0.2,0.2","0.001,0.02,-0.003","0.8,-0.6,0.001")),row.names = c(NA,-4L),类= c("data.table","data.frame"))
这是一个data.table解决方案,该解决方案应该可以快速运行并且独立于所提供的Beta数量.
库(data.table)库(matrixStats)#将df设置为data.tablesetDT(df)#将Beta(s)拆分到列(动态)df [,paste0("Beta",1:length(tstrsplit(df $`Beta(s)`,,")))):=lapply(tstrsplit(`Beta(s)`,,"),as.numeric)] []#基因Beta Beta1 Beta2 Beta3#1:ACE 0.01,-0.6、0.4 0.010 -0.60 0.400#2:BRCA 0.7,-0.2、0.2 0.700 -0.20 0.200#3:ZAP70 0.001、0.02,-0.003 0.001 0.02 -0.003#4:P53 0.8,-0.6、0.001 0.800 -0.60 0.001#now,使用matrixStats-package中的rowMINs和RowMAxs(= FAST !!)#通过引用获取过滤(和更新).#如果一个基因/行的值小于-0.5,且值不大于0.5,则仅保留最大的负值.df [df [,rowMins(as.matrix(.SD),na.rm = TRUE),.SDcols = pattern("^ Beta [0-9]")]<-0.5&df [,rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = pattern("^ Beta [0-9]")]< = 0.5,Beta.final:= rowMins(as.matrix(.SD),na.rm = TRUE),.SDcols = patterns("^ Beta [0-9]")]#如果其值大于0.5,且没有一个值小于-0.5,则仅保留最大的正值.df [df [,rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = patterns("^ Beta [0-9]")]>0.5和df [,rowMins(as.matrix(.SD),na.rm = TRUE),.SDcols = patterns("^ Beta [0-9]")]> = -0.5,Beta.final:= rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = patterns("^ Beta [0-9]")]#如果其值不小于-0.5或大于0.5,则保留最大值.df [df [,rowMins(as.matrix(.SD),na.rm = TRUE),.SDcols = pattern("^ Beta [0-9]")]> = -0.5&df [,rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = pattern("^ Beta [0-9]")]< = 0.5,Beta.final:= rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = patterns("^ Beta [0-9]")]#如果两个值均小于-0.5并且大于0.5,则保持最大值.df [df [,rowMins(as.matrix(.SD),na.rm = TRUE),.SDcols = pattern("^ Beta [0-9]")]<-0.5&df [,rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = pattern("^ Beta [0-9]")]>0.5,Beta.final:= rowMaxs(as.matrix(.SD),na.rm = TRUE),.SDcols = patterns("^ Beta [0-9]")]
*输出
#final输出df [,.(Gene,`Beta(s)= Beta.final)] []#基因Beta#1:ACE -0.60#2:BRCA 0.70#3:ZAP70 0.02#4:P53 0.80
I have a genetic data set where each row describes a gene and has a beta column with multiple beta values I've compressed into one row/cell (from the variant level where multiple variants in one gene gave multiple betas). The beta is the effect size that the gene can have on a condition so large negative values are important as well as large positive values. I am trying to write code that selects either the largest negative or largest positive beta value for a gene, cutting off at -0.5 and 0.5.
The rules I am trying to code are these:
If a gene/row has a value less than -0.5 and no values higher than 0.5 then keep only the largest negative value.
If it has a value higher than 0.5 and no values less than -0.5 keep only the largest positive value.
If it has no values less than -0.5 or more than 0.5 keep the largest value.
If it has both values less than -0.5 and more than 0.5 keep the largest value.
For example my data looks like this:
Gene Beta(s)
ACE 0.01, -0.6, 0.4
BRCA 0.7, -0.2, 0.2
ZAP70 0.001, 0.02, -0.003
P53 0.8, -0.6, 0.001
Expected output (selecting largest negative or positive values depending on set conditions):
Gene Beta(s)
ACE -0.6
BRCA 0.7
ZAP70 0.02
P53 0.8
I am from a biology background and new to R, so not sure how to code this. At the moment I am working with functions to select either the maximum or minimum beta values for a gene, but I don't know how to amend this with further conditions:
max2 = function(x) if(all(is.na(x))) NA else max(x,na.rm = T)
getmax = function(col) str_extract_all(col,"[0-9\\.-]+") %>%
lapply(.,function(x)max2(as.numeric(x)) ) %>%
unlist()
min2 = function(x) if(all(is.na(x))) NA else min(x,na.rm = T)
getmin = function(col) str_extract_all(col,"[0-9\\.-]+") %>%
lapply(.,function(x)min2(as.numeric(x)) ) %>%
unlist()
test <- df %>%
mutate_at(names(df)[2],getmax)
Any help in the right direction of how to set multiple conditional statements would be appreciated.
Example data:
dput(df)
structure(list(Gene = c("ACE", "BRCA", "ZAP70", "P53"), `Beta(s)` = c("0.01, -0.6, 0.4",
"0.7, -0.2, 0.2", "0.001, 0.02, -0.003", "0.8, -0.6, 0.001")), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))
Here is a data.table solution that should work fast and indepentant of the number of beta's provided.
library( data.table )
library( matrixStats )
#set df as data.table
setDT( df )
#split Beta(s) to columns (dynamically)
df[, paste0( "Beta",
1:length( tstrsplit( df$`Beta(s)`, "," ) ) ) :=
lapply( tstrsplit( `Beta(s)`, "," ), as.numeric ) ][]
# Gene Beta(s) Beta1 Beta2 Beta3
# 1: ACE 0.01, -0.6, 0.4 0.010 -0.60 0.400
# 2: BRCA 0.7, -0.2, 0.2 0.700 -0.20 0.200
# 3: ZAP70 0.001, 0.02, -0.003 0.001 0.02 -0.003
# 4: P53 0.8, -0.6, 0.001 0.800 -0.60 0.001
#now, using rowMINs ans RowMAxs from the matrixStats-package (=FAST!!)
# get the filtering (and updating) done by reference.
#If a gene/row has a value less than -0.5 and no values higher than 0.5 then keep only the largest negative value.
df[ df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] < -0.5 &
df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] <= 0.5,
Beta.final := rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
#If it has a value higher than 0.5 and no values less than -0.5 keep only the largest positive value.
df[ df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] > 0.5 &
df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] >= -0.5,
Beta.final := rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
#If it has no values less than -0.5 or more than 0.5 keep the largest value.
df[ df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] >= -0.5 &
df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] <= 0.5,
Beta.final := rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
#If it has both values less than -0.5 and more than 0.5 keep the largest value.
df[ df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] < -0.5 &
df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] > 0.5,
Beta.final := rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
*output
#final output
df[, .(Gene, `Beta(s)` = Beta.final )][]
# Gene Beta(s)
# 1: ACE -0.60
# 2: BRCA 0.70
# 3: ZAP70 0.02
# 4: P53 0.80
这篇关于如何为R中的一行中的多个值设置多个条件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!