如何删除所有列都为零的行 [英] How to delete rows where all the columns are zero

查看:24
本文介绍了如何删除所有列都为零的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框

dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))

打印:

> dat 
  a b c
1 0 1 0
2 0 0 0
3 2 0 1
4 3 0 3

我想删除所有列都为零的行,导致这个:

I want to remove rows where all the columns are zeros, resulting in this:

  a b c
1 0 1 0 
3 2 0 1
4 3 0 3

我怎样才能做到这一点?

How can I achieve that?

我试过了,但失败了:

> row_sub = apply(dat, 1, function(row) all(row !=0 ))
> dat[row_sub,]
[1] a b c
<0 rows> (or 0-length row.names)

推荐答案

可以使用(1)

dat[as.logical(rowSums(dat != 0)), ]

这适用于正值和负值.

对于大型数据集的另一种更快的可能性是 (2)

Another, even faster, possibility for large datasets is (2)

dat[rowSums(!as.matrix(dat)) < ncol(dat), ]

处理短数据帧和长数据帧的更快方法是使用矩阵乘法 (3):

A faster approach for short and long data frames is to use matrix multiplication (3):

dat[as.logical(abs(as.matrix(dat)) %*% rep(1L, ncol(dat))), ]

<小时>

一些基准:


Some benchmarks:

# the original dataset
dat <- data.frame(a = c(0,0,2,3), b= c(1,0,0,0), c=c(0,0,1,3))

Codoremifa <- function() dat[rowSums(abs(dat)) != 0,]
Marco <- function() dat[!apply(dat, 1, function(x) all(x == 0)), ]
Sven <- function() dat[as.logical(rowSums(dat != 0)), ]
Sven_2 <- function() dat[rowSums(!as.matrix(dat)) < ncol(dat), ]
Sven_3 <- function() dat[as.logical(abs(as.matrix(dat)) %*% rep(1L,ncol(dat))), ]

library(microbenchmark)
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: microseconds
#          expr     min       lq   median       uq     max neval
#  Codoremifa() 267.772 273.2145 277.1015 284.0995 1190.197   100
#       Marco() 192.509 198.4190 201.2175 208.9925  265.594   100
#        Sven() 143.372 147.7260 150.0585 153.9455  227.031   100
#      Sven_2() 152.080 155.1900 156.9000 161.5650  214.591   100
#      Sven_3() 146.793 151.1460 153.3235 157.9885  187.845   100


# a data frame with 10.000 rows
set.seed(1)
dat <- dat[sample(nrow(dat), 10000, TRUE), ]
microbenchmark(Codoremifa(), Marco(), Sven(), Sven_2(), Sven_3())
# Unit: milliseconds
#          expr       min        lq    median        uq        max neval
#   Codoremifa()  2.426419  2.471204  3.488017  3.750189  84.268432   100
#        Marco() 36.268766 37.840246 39.406751 40.791321 119.233175   100
#         Sven()  2.145587  2.184150  2.205299  2.270764  83.055534   100
#       Sven_2()  2.007814  2.048711  2.077167  2.207942  84.944856   100
#       Sven_3()  1.814994  1.844229  1.861022  1.917779   4.452892   100

这篇关于如何删除所有列都为零的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆