通过多个列嵌套if else语句 [英] Nested if else statements over a number of columns

查看:150
本文介绍了通过多个列嵌套if else语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的 data.frame 其中前三列包含有关标记的信息。剩余的列是每个人中该标记的数字类型。每个人都有三列。数据集如下所示:

I have a large data.frame where the first three columns contain information about a marker. The remaining columns are of numeric type for that marker in each individual. Each individual has three columns. The dataset looks as follows:

                      marker alleleA alleleB   X818 X818.1 X818.2   X345 X345.1 X345.2   X346 X346.1 X346.2
1   kgp5209280_chr3_21902067       T       A 0.0000 1.0000 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000
2 chr3_21902130_21902131_A_T       A       T 0.8626 0.1356 0.0018 0.7676 0.2170 0.0154 0.8626 0.1356 0.0018
3 chr3_21902134_21902135_T_C       T       C 0.6982 0.2854 0.0164 0.5617 0.3749 0.0634 0.6982 0.2854 0.0164

就是说,对于每个标记(行),每个人都有三个值,每个列。

That is, for each marker (row), each individual has three values, one in each column.

我想创建一个新的 data.frame ,它们与原始行全部相同,但每个人只有一列。在每个人的一列中,我想要为每个人大于0.8的三个值。如果没有值大于0.8,那么我想打印NA。例如,在第一行给出的数据集中,我想要第二个值为818(1.0000),第一个值为345(1.0000)。在第二行,我想要第一个值为818(0.8626),而345中没有值超过0.8,所以我想打印NA,等等。因此,新数据集将如下所示:

I want to create a new data.frame which has all the same rows as in the original, but only one column per individual. In the one column for each individual I want the value out of the three for each individual which is greater than 0.8. If no value is greater than 0.8 then I want to print NA. For instance, in the data set I have given for the first row I would want the second value for 818 (1.0000), and the first value for 345 (1.0000). In the second row, I want the first value for 818 (0.8626), and for 345 none of the values are above 0.8 so I want NA to be printed and so on. The new data set would therefore look like this:

                     marker alleleA alleleB   X818 X345
1   kgp5209280_chr3_21902067       T       A 1.0000    1
2 chr3_21902130_21902131_A_T       A       T 0.8626   NA

我一直在尝试使用 if / else 语句,沿 if [,4]> 0.8然后[,4],否则... 然而,它似乎没有给我我想要的,我也必须循环这个命令,所以它不只是为一个个人在前三列,但所有列。

I have been trying to use if/else statements, along the lines of if [, 4] > 0.8 then [, 4], else... however it doesn't seem to give me what I want, and I would also have to loop this command so it doesn't just do it for one individual in the first three columns but for all columns.

任何帮助将不胜感激!感谢提前。

Any help would be appreciated! Thanks in advance.

推荐答案

编辑:使用在数据中实现的快速融合/ dcast方法的更新解决方案。表 version> = 1.9.0。 此处 获取更多信息。



Updated solution using the fast melt/dcast methods implemented in data.table versions >= 1.9.0. Go here for more info.

require(data.table)
require(reshape2)
dt <- as.data.table(df)

# melt data.table
dt.m <- melt(dt, id=c("marker", "alleleA", "alleleB"), 
                 variable.name="id", value.name="val")
dt.m[, id := gsub("\\.[0-9]+$", "", id)] # replace `.[0-9]` with nothing
# aggregation
dt.m <- dt.m[, list(alleleA = alleleA[1], 
         alleleB = alleleB[1], val = max(val)), 
        keyby=list(marker, id)][val <= 0.8, val := NA]
# casting back
dt.c <- dcast.data.table(dt.m, marker + alleleA + alleleB ~ id)
#                        marker alleleA alleleB X345   X346   X818
# 1: chr3_21902130_21902131_A_T       A       T   NA 0.8626 0.8626
# 2: chr3_21902134_21902135_T_C       T       C   NA     NA     NA
# 3:   kgp5209280_chr3_21902067       T       A    1 1.0000 1.0000






解决方案1:可能不是最好的方法,但这是我现在可以想到的:


Solution 1: Probably not the best way, but this is what I could think of at the moment:

mm <- t(apply(df[-(1:3)], 1, function(x) tapply(x, gl(3,3), max)))
mode(mm) <- "numeric"
mm[mm < 0.8] <- NA 
# you can set the column names of mm here if necessary
out <- cbind(df[, 1:3], mm)

#                       marker alleleA alleleB      1  2      3
# 1   kgp5209280_chr3_21902067       T       A 1.0000  1 1.0000
# 2 chr3_21902130_21902131_A_T       A       T 0.8626 NA 0.8626
# 3 chr3_21902134_21902135_T_C       T       C     NA NA     NA

gl(3,3)给出值为 1的因子,1,1,2,2,3,3,3 ,级别 1,2,3 。也就是说,直拨将一次获取 x 3的值,并获得 max (前3,下3和最后3)。而应用逐行发送。

gl(3,3) gives a factor with values 1,1,1,2,2,2,3,3,3 with levels 1,2,3. That is, tapply will take the values x 3 at a time and get their max (first 3, next 3 and the last 3). And apply sends each row one by one.

解决方案2:一个 data.table 解决方案与融合 cast data.table without 使用 reshape reshape2

Solution 2: A data.table solution with melt and cast within data.table without using reshape or reshape2:

require(data.table)
dt <- data.table(df)
# melt your data.table to long format
dt.melt <- dt[, list(id = names(.SD), val = unlist(.SD)), 
                  by=list(marker, alleleA, alleleB)]
# replace `.[0-9]` with nothing
dt.melt[, id := gsub("\\.[0-9]+$", "", id)]
# get max value grouping by marker and id
dt.melt <- dt.melt[, list(alleleA = alleleA[1], 
                      alleleB = alleleB[1], 
                      val = max(val)), 
        keyby=list(marker, id)][val <= 0.8, val := NA]
# edit mnel (use setattr(,'names') to avoid copy by `names<-` within `setNames`
dt.cast <- dt.melt[, as.list(setattr(val,'names', id)), 
                   by=list(marker, alleleA, alleleB)]

#                        marker alleleA alleleB X345   X346   X818
# 1: chr3_21902130_21902131_A_T       A       T   NA 0.8626 0.8626
# 2: chr3_21902134_21902135_T_C       T       C   NA     NA     NA
# 3:   kgp5209280_chr3_21902067       T       A    1 1.0000 1.0000

这篇关于通过多个列嵌套if else语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆