子集数据帧的最有效方法 [英] Most efficient way of subsetting dataframes

查看：54 发布时间：2020/10/17 0:20:02 performance r dataframe subset

本文介绍了子集数据帧的最有效方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

谁能建议一种更有效的子集数据框方式，而不使用 SQL / indexing / data.table 选项？

Can anyone suggest more efficient way of subsetting dataframe without using SQL/indexing/data.table options?

我寻找了类似的问题，并且此问题建议了索引选项。

I looked for similar questions, and this one suggests indexing option.

以下是按时间子集的方法。

Here are ways to subset with timings.

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Subset and time
system.time(x <- dat[dat$x > 500, ])
#   user  system elapsed 
#  0.092   0.000   0.090 
system.time(x <- dat[which(dat$x > 500), ])
#   user  system elapsed 
#  0.040   0.032   0.070 
system.time(x <- subset(dat, x > 500))
#   user  system elapsed 
#  0.108   0.004   0.109

编辑：
如Roland所建议，我使用了微基准测试。似乎哪个表现最好。

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x > 500, ],
                       dat[which(dat$x > 500), ],
                       subset(dat, x > 500))
#plot
autoplot.microbenchmark(res)

推荐答案

根据Roland的建议，我使用了微基准测试。似乎哪个表现最好。

As Roland suggested I used microbenchmark. It seems which performs the best.

library("ggplot2")
library("microbenchmark")

#Dummy data
dat <- data.frame(x = runif(1000000, 1, 1000), y=runif(1000000, 1, 1000))

#Benchmark
res <- microbenchmark( dat[dat$x > 500, ],
                       dat[which(dat$x > 500), ],
                       subset(dat, x > 500))
#plot
autoplot.microbenchmark(res)

这篇关于子集数据帧的最有效方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

子集数据帧的最有效方法 [英] Most efficient way of subsetting dataframes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

子集数据帧的最有效方法 [英] Most efficient way of subsetting dataframes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭