确定数据帧是否为空 [英] Determine if data frame is empty

查看:138
本文介绍了确定数据帧是否为空的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如果空的话,我想测试很快。我知道没有行或有整数(没有缺少值)。到目前为止,我已经测试了五个不同的选项(见下文)。有没有人有更快的解决方案?

I have a data frame and I would like to test really fast if it is empty or not. I know that there are either no rows or there are integers (no missing values). So far, I have tested five different options (see below). Does anyone have even faster solution?

df <- data.frame(a = integer(0), b = integer(0), c = integer(0))

fa <- function(){
  nrow(df) > 0
}

fb <- function(){
  any(dim(df)[1L])
}

fc <- function(){
  (dim(df)[1L]) != 0
}

fd <- function() {
  any(.subset2(df, 1)[1])
}

fe <- function() {
  any(.subset2(df, 1))
}

library(microbenchmark)
microbenchmark(fa(), fb(), fc(), fd(), fe(), times = 1000)

结果:

> microbenchmark(fa(), fb(), fc(), fd(), fe(), times = 1000)
Unit: nanoseconds
 expr  min   lq     mean median    uq   max neval  cld
 fa() 5664 6725 8672.462   6725 11680 47777  1000   cd
 fb() 6017 7078 8979.645   7079 12034 58041  1000    d
 fc() 6017 6372 8492.680   6725 11679 25127  1000   c 
 fd() 1062 1770 2214.170   1771  2832 14511  1000  b  
 fe()  354 1062 1359.498   1063  1770 12741  1000 a   


推荐答案

由于您测试的大多数对象都不太可能是空的,所以您应该更加关心在非空数据框架上的功能时间。您还应该编译它们以了解他们在包中的表现。

Since most of the objects you tests aren't likely to be empty, you should be more concerned about the timing of your functions on a non-empty data.frame. You should also compile them to get a sense for how they would perform in a package.

library(microbenchmark)
library(compiler)

fa <- cmpfun({function(){
  nrow(df) > 0L
}})

fb <- cmpfun({function(){
  any(dim(df)[1L])
}})

fc <- cmpfun({function(){
  dim(df)[1L] != 0L
}})

fd <- cmpfun({function() {
  any(.subset2(df, 1L)[1L])
}})

fe <- cmpfun({function() {
  any(.subset2(df, 1L))
}})

ff <- cmpfun({function() {
  length(.subset2(df, 1L)) > 0L
}})

fg <- cmpfun({function() {
  as.logical(length(.subset2(df, 1L)))
}})

对空数据框架的测试显示所有方法大致相同。

The test on an empty data.frame shows all methods are roughly the same.

df <- data.frame(a = integer(0), b = integer(0), c = integer(0))
microbenchmark(fa(), fb(), fc(), fd(), fe(), ff(), fg(), times = 1000)

# Unit: nanoseconds
#  expr  min     lq median     uq   max neval
#  fa() 5685 5969.0 6165.0 6608.5 20515  1000
#  fb() 6147 6443.0 6651.0 7214.0 18117  1000
#  fc() 5726 5984.0 6152.0 6457.5 38404  1000
#  fd() 1210 1411.0 1573.0 1764.5  4933  1000
#  fe()  635  871.0 1003.0 1105.5 10225  1000
#  ff()  513  727.5  861.5  941.0  5691  1000
#  fg()  681  868.5  981.5 1080.0  2982  1000

对非空数据框的测试表明,其中一个功能是一个非常糟糕的表现,而其余的功能大致相同。

The test on a non-empty data.frame shows that one of the functions is a really bad performer, while the rest are roughly the same.

df <- data.frame(a = integer(1e6), b = integer(1e6), c = integer(1e6))
microbenchmark(fa(), fb(), fc(), fd(), fe(), ff(), fg(), times = 1000)

# Unit: nanoseconds
#  expr     min      lq    median        uq      max neval
#  fa()    6569    7142    8782.0   12364.5    46749  1000
#  fb()    7034    7682    9334.5   18334.0    53172  1000
#  fc()    6539    7110    8453.5   20585.5    49912  1000
#  fd()    1171    1585    2507.5    5021.5    17641  1000
#  fe() 4340209 4413042 4460973.5 5468688.5 26045766  1000
#  ff()     637     984    1489.0    3646.5    14212  1000
#  fg()     767    1161    2401.0    4078.5   236958  1000

这篇关于确定数据帧是否为空的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆