如何为R中的一行中的多个值设置多个条件? [英] How to set multiple conditions for multiple values in a row in R?

查看：51 发布时间：2021/4/15 19:47:18 r dplyr conditional-statements bioinformatics

本文介绍了如何为R中的一行中的多个值设置多个条件?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个遗传数据集，其中每一行都描述一个基因，并且有一个带有多个beta值的beta列，我已将其压缩成一行/单元格(来自一个基因中多个变体产生多个beta的变体水平).β是基因在一定条件下可能具有的效应大小，因此大的负值和大的正值都很重要.我正在尝试编写选择一个基因的最大负β值或最大正β值的代码，截取值分别为-0.5和0.5.

我要编写的规则如下:

如果一个基因/行的值小于-0.5，且值不大于0.5，则仅保留最大的负值.

如果其值大于0.5而没有小于-0.5的值，则仅保留最大的正值.

如果其值不小于-0.5或大于0.5，则保留最大值.

如果两个值均小于-0.5并且大于0.5，则保持最大值.

例如，我的数据如下:

 基因BetaACE 0.01，-0.6、0.4BRCA 0.7，-0.2、0.2ZAP70 0.001，0.02，-0.003P53 0.8，-0.6、0.001

预期输出(根据设置条件选择最大的负值或正值):

 基因BetaACE -0.6BRCA 0.7ZAP70 0.02P53 0.8

我来自生物学背景，是R的新手，所以不确定如何编码.目前，我正在使用函数来选择基因的最大β值或最小β值，但是我不知道如何在进一步的条件下对此进行修改:

  max2 = function(x)if(all(is.na(x)))NA else max(x，na.rm = T)getmax = function(col)str_extract_all(col，"[0-9 \\ .-] +")％&％;％lapply(.，function(x)max2(as.numeric(x)))％>％unlist()min2 =函数(x)if(all(is.na(x)))NA不存在min(x，na.rm = T)getmin = function(col)str_extract_all(col，"[0-9 \\ .-] +")％&％;％lapply(.，function(x)min2(as.numeric(x)))％>％unlist()测试<-df％&％;％mutate_at(names(df)[2]，getmax)

在正确的方向上设置多个条件语句的任何帮助将不胜感激.

示例数据:

  dput(df)结构(列表(基因= c("ACE"，"BRCA"，"ZAP70"，"P53")，测试版" = c("0.01，-0.6、0.4"，"0.7，-0.2，0.2"，"0.001，0.02，-0.003"，"0.8，-0.6，0.001"))，row.names = c(NA，-4L)，类= c("data.table"，"data.frame"))

解决方案

这是一个data.table解决方案，该解决方案应该可以快速运行并且独立于所提供的Beta数量.

 库(data.table)库(matrixStats)#将df设置为data.tablesetDT(df)#将Beta(s)拆分到列(动态)df [，paste0("Beta"，1:length(tstrsplit(df $`Beta(s)`，，")))):=lapply(tstrsplit(`Beta(s)`，，")，as.numeric)] []#基因Beta Beta1 Beta2 Beta3#1:ACE 0.01，-0.6、0.4 0.010 -0.60 0.400#2:BRCA 0.7，-0.2、0.2 0.700 -0.20 0.200#3:ZAP70 0.001、0.02，-0.003 0.001 0.02 -0.003#4:P53 0.8，-0.6、0.001 0.800 -0.60 0.001#now，使用matrixStats-package中的rowMINs和RowMAxs(= FAST !!)#通过引用获取过滤(和更新).#如果一个基因/行的值小于-0.5，且值不大于0.5，则仅保留最大的负值.df [df [，rowMins(as.matrix(.SD)，na.rm = TRUE)，.SDcols = pattern("^ Beta [0-9]")]<-0.5&df [，rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = pattern("^ Beta [0-9]")]< = 0.5，Beta.final:= rowMins(as.matrix(.SD)，na.rm = TRUE)，.SDcols = patterns("^ Beta [0-9]")]#如果其值大于0.5，且没有一个值小于-0.5，则仅保留最大的正值.df [df [，rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = patterns("^ Beta [0-9]")]>0.5和df [，rowMins(as.matrix(.SD)，na.rm = TRUE)，.SDcols = patterns("^ Beta [0-9]")]> = -0.5，Beta.final:= rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = patterns("^ Beta [0-9]")]#如果其值不小于-0.5或大于0.5，则保留最大值.df [df [，rowMins(as.matrix(.SD)，na.rm = TRUE)，.SDcols = pattern("^ Beta [0-9]")]> = -0.5&df [，rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = pattern("^ Beta [0-9]")]< = 0.5，Beta.final:= rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = patterns("^ Beta [0-9]")]#如果两个值均小于-0.5并且大于0.5，则保持最大值.df [df [，rowMins(as.matrix(.SD)，na.rm = TRUE)，.SDcols = pattern("^ Beta [0-9]")]<-0.5&df [，rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = pattern("^ Beta [0-9]")]>0.5，Beta.final:= rowMaxs(as.matrix(.SD)，na.rm = TRUE)，.SDcols = patterns("^ Beta [0-9]")]

*输出

  #final输出df [，.(Gene，`Beta(s)= Beta.final)] []#基因Beta#1:ACE -0.60#2:BRCA 0.70#3:ZAP70 0.02#4:P53 0.80

I have a genetic data set where each row describes a gene and has a beta column with multiple beta values I've compressed into one row/cell (from the variant level where multiple variants in one gene gave multiple betas). The beta is the effect size that the gene can have on a condition so large negative values are important as well as large positive values. I am trying to write code that selects either the largest negative or largest positive beta value for a gene, cutting off at -0.5 and 0.5.

The rules I am trying to code are these:

If a gene/row has a value less than -0.5 and no values higher than 0.5 then keep only the largest negative value.

If it has a value higher than 0.5 and no values less than -0.5 keep only the largest positive value.

If it has no values less than -0.5 or more than 0.5 keep the largest value.

If it has both values less than -0.5 and more than 0.5 keep the largest value.

For example my data looks like this:

Gene    Beta(s)
ACE     0.01, -0.6, 0.4
BRCA    0.7, -0.2, 0.2 
ZAP70   0.001, 0.02, -0.003
P53     0.8, -0.6, 0.001

Expected output (selecting largest negative or positive values depending on set conditions):

Gene    Beta(s)
ACE     -0.6  
BRCA     0.7
ZAP70    0.02
P53      0.8

I am from a biology background and new to R, so not sure how to code this. At the moment I am working with functions to select either the maximum or minimum beta values for a gene, but I don't know how to amend this with further conditions:

max2 = function(x) if(all(is.na(x))) NA else max(x,na.rm = T)
getmax = function(col) str_extract_all(col,"[0-9\\.-]+") %>%
  lapply(.,function(x)max2(as.numeric(x)) ) %>%
  unlist() 

min2 = function(x) if(all(is.na(x))) NA else min(x,na.rm = T)
getmin = function(col) str_extract_all(col,"[0-9\\.-]+") %>%
  lapply(.,function(x)min2(as.numeric(x)) ) %>%
  unlist() 

test <- df %>%
  mutate_at(names(df)[2],getmax)

Any help in the right direction of how to set multiple conditional statements would be appreciated.

Example data:

 dput(df)
structure(list(Gene = c("ACE", "BRCA", "ZAP70", "P53"), `Beta(s)` = c("0.01, -0.6, 0.4", 
"0.7, -0.2, 0.2", "0.001, 0.02, -0.003", "0.8, -0.6, 0.001")), row.names = c(NA, 
-4L), class = c("data.table", "data.frame"))

解决方案

Here is a data.table solution that should work fast and indepentant of the number of beta's provided.

library( data.table )
library( matrixStats ) 
#set df as data.table
setDT( df )
#split Beta(s) to columns (dynamically)
df[, paste0( "Beta", 
             1:length( tstrsplit( df$`Beta(s)`, "," ) ) ) := 
     lapply( tstrsplit( `Beta(s)`, "," ), as.numeric ) ][]
#     Gene             Beta(s) Beta1 Beta2  Beta3
# 1:   ACE     0.01, -0.6, 0.4 0.010 -0.60  0.400
# 2:  BRCA      0.7, -0.2, 0.2 0.700 -0.20  0.200
# 3: ZAP70 0.001, 0.02, -0.003 0.001  0.02 -0.003
# 4:   P53    0.8, -0.6, 0.001 0.800 -0.60  0.001


#now, using rowMINs ans RowMAxs from the matrixStats-package (=FAST!!)
# get the filtering (and updating) done by reference.

#If a gene/row has a value less than -0.5 and no values higher than 0.5 then keep only the largest negative value.
df[ df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] < -0.5 &
      df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] <= 0.5,
    Beta.final := rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
#If it has a value higher than 0.5 and no values less than -0.5 keep only the largest positive value.
df[ df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] > 0.5 &
      df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] >= -0.5,
    Beta.final := rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
#If it has no values less than -0.5 or more than 0.5 keep the largest value.
df[ df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] >= -0.5 &
      df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] <= 0.5,
    Beta.final := rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]
#If it has both values less than -0.5 and more than 0.5 keep the largest value.
df[ df[, rowMins( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] < -0.5 &
      df[, rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ] > 0.5,
    Beta.final := rowMaxs( as.matrix(.SD), na.rm = TRUE ), .SDcols = patterns("^Beta[0-9]") ]

*output

#final output
df[, .(Gene, `Beta(s)` = Beta.final )][]
#     Gene Beta(s)
# 1:   ACE   -0.60
# 2:  BRCA    0.70
# 3: ZAP70    0.02
# 4:   P53    0.80

这篇关于如何为R中的一行中的多个值设置多个条件?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何为R中的一行中的多个值设置多个条件? [英] How to set multiple conditions for multiple values in a row in R?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何为R中的一行中的多个值设置多个条件? [英] How to set multiple conditions for multiple values in a row in R?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭