从仅包含0或仅包含0的数据框中删除行 [英] Remove rows from dataframe that contains only 0 or just a single 0
问题描述
我正在尝试在R中创建一个函数,该函数将允许我根据行中是否包含零位的单个列来过滤数据集.此外,有时候我只想删除所有列中均为零的行.
I am trying to create a function in R that will allow me to filter my data set based on whether a row contains a single column with a zero in it. Furthermore, some times I only want to remove rows that is zero in all columns.
而且,这就是它的乐趣所在;并非所有列都包含数字,并且列数可以变化.
Also, and this is where it gets fun; not all columns contains numbers and the number of columns can vary.
我尝试将一些数据粘贴到我想要获得的结果中.
I have tried to paste some of my data here with the results I want to obtain.
unfiltered:
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
3 MIR612 0 0 530.068 0
4 MIR218-1 0 0 1166.88 701.253
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
6 MIR218-2 0 0 0 0
7 MIR10B 787.516 330.556 0 20336.4
8 MIR3176 0 0 0 0
any rows with containing a zero removed:
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
only rows that is all zero is filtered away:
ID GeneName DU145small DU145total PC3small PC3total
1 MIR22HG 33221.5 1224.55 2156.43 573.315
2 MIRLET7E 87566.1 7737.99 25039.3 16415.6
3 MIR612 0 0 530.068 0
4 MIR218-1 0 0 1166.88 701.253
5 MIR181B2 70723.2 3958.01 6209.85 1399.34
7 MIR10B 787.516 330.556 0 20336.4
我确实找到了一种删除其中至少包含1个零的行的方法,但是通过与NA交换所有零,然后使用complete.cases进行过滤,这是欺骗".
I did find a way of removing any rows that had at least 1 zero in it, but it was "cheating" by exchanging all zeros with NA and then using complete.cases to filter.
此外,通过执行此操作,它会删除GeneName
内所有零的行(对于MIR10B).
Also, by doing that it remove all rows where the GeneName
had a zero in it (as for MIR10B).
我可以使用for循环来解决它,但是有人告诉我R中的循环是非常无效的,因此希望避免这种解决方法.
I can solve it by using for loops, but I have been told that loops in R is very ineffective and would therefore like to avoid that solution.
尽管Xin Yin的解决方案工作得很好,并且将数据保存在数据帧中,但是David Arenburg的答案应该更有效,应该使用.
While Xin Yin's solution works perfectly well and kept the data in a data frame, the answer by David Arenburg is supposedly more efficient and should be used.
推荐答案
使用data.table
(假设df
是您的数据集)
Using data.table
(assuming df
is your data set)
library(data.table)
setDT(df)[, .SD[!all(.SD[, -1, with = F] == 0)], by = GeneName]
# GeneName ID DU145small DU145total PC3small PC3total
# 1: MIR22HG 1 33221.500 1224.550 2156.430 573.315
# 2: MIRLET7E 2 87566.100 7737.990 25039.300 16415.600
# 3: MIR612 3 0.000 0.000 530.068 0.000
# 4: MIR218-1 4 0.000 0.000 1166.880 701.253
# 5: MIR181B2 5 70723.200 3958.010 6209.850 1399.340
# 6: MIR10B 7 787.516 330.556 0.000 20336.400
或者如果您只想删除带有零的行
Or if you only want to remove rows with any zeroes
setDT(df)[, .SD[!any(.SD[, -1, with = F] == 0)], by = GeneName]
# GeneName ID DU145small DU145total PC3small PC3total
# 1: MIR22HG 1 33221.5 1224.55 2156.43 573.315
# 2: MIRLET7E 2 87566.1 7737.99 25039.30 16415.600
# 3: MIR181B2 5 70723.2 3958.01 6209.85 1399.340
这篇关于从仅包含0或仅包含0的数据框中删除行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!