从仅包含0或仅包含0的数据框中删除行 [英] Remove rows from dataframe that contains only 0 or just a single 0

查看:158
本文介绍了从仅包含0或仅包含0的数据框中删除行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在R中创建一个函数,该函数将允许我根据行中是否包含零位的单个列来过滤数据集.此外,有时候我只想删除所有列中均为零的行.

I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns.

而且,这就是它的乐趣所在;并非所有列都包含数字,并且列数可以变化.

Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary.

我尝试将一些数据粘贴到我想要获得的结果中.

I have tried to paste some of my data here with the results I want to obtain.

unfiltered:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    3   MIR612      0           0           530.068     0
    4   MIR218-1    0           0           1166.88     701.253
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34
    6   MIR218-2    0           0           0           0
    7   MIR10B      787.516     330.556     0           20336.4
    8   MIR3176     0           0           0           0

any rows with containing a zero removed:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34

only rows that is all zero is filtered away:
    ID  GeneName    DU145small  DU145total  PC3small    PC3total
    1   MIR22HG     33221.5     1224.55     2156.43     573.315
    2   MIRLET7E    87566.1     7737.99     25039.3     16415.6
    3   MIR612      0           0           530.068     0
    4   MIR218-1    0           0           1166.88     701.253
    5   MIR181B2    70723.2     3958.01     6209.85     1399.34
    7   MIR10B      787.516     330.556     0           20336.4

我确实找到了一种删除其中至少包含1个零的行的方法,但是通过与NA交换所有零,然后使用complete.cases进行过滤,这是欺骗".

I did find a way of removing any rows that had at least 1 zero in it, but it was "cheating" by exchanging all zeros with NA and then using complete.cases to filter.

此外,通过执行此操作,它会删除GeneName内所有零的行(对于MIR10B).

Also, by doing that it remove all rows where the GeneName had a zero in it (as for MIR10B).

我可以使用for循环来解决它,但是有人告诉我R中的循环是非常无效的,因此希望避免这种解决方法.

I can solve it by using for loops, but I have been told that loops in R is very ineffective and would therefore like to avoid that solution.

尽管Xin Yin的解决方案工作得很好,并且将数据保存在数据帧中,但是David Arenburg的答案应该更有效,应该使用.

While Xin Yin's solution works perfectly well and kept the data in a data frame, the answer by David Arenburg is supposedly more efficient and should be used.

推荐答案

使用data.table(假设df是您的数据集)

Using data.table (assuming df is your data set)

library(data.table)
setDT(df)[, .SD[!all(.SD[, -1, with = F] == 0)], by = GeneName]

#    GeneName ID DU145small DU145total  PC3small  PC3total
# 1:  MIR22HG  1  33221.500   1224.550  2156.430   573.315
# 2: MIRLET7E  2  87566.100   7737.990 25039.300 16415.600
# 3:   MIR612  3      0.000      0.000   530.068     0.000
# 4: MIR218-1  4      0.000      0.000  1166.880   701.253
# 5: MIR181B2  5  70723.200   3958.010  6209.850  1399.340
# 6:   MIR10B  7    787.516    330.556     0.000 20336.400

或者如果您只想删除带有零的行

Or if you only want to remove rows with any zeroes

setDT(df)[, .SD[!any(.SD[, -1, with = F] == 0)], by = GeneName]

#    GeneName ID DU145small DU145total PC3small  PC3total
# 1:  MIR22HG  1    33221.5    1224.55  2156.43   573.315
# 2: MIRLET7E  2    87566.1    7737.99 25039.30 16415.600
# 3: MIR181B2  5    70723.2    3958.01  6209.85  1399.340

这篇关于从仅包含0或仅包含0的数据框中删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆