dplyr包：如何使用'％xyz％'SQL语法查询大数据帧？ [英] dplyr package: How can I query large data frame using like '%xyz%' SQL syntax?

查看：74 发布时间：2017/7/13 21:14:22 r dplyr

本文介绍了dplyr包：如何使用'％xyz％'SQL语法查询大数据帧？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

dplyr是唯一可以处理我的843k数据帧的软件包，并以快速的方式进行查询。
我可以使用一些数学和相等的标准过滤，但是我需要实现一个概念的搜索。

我需要像这样sqldf query

  library（sqldf）
 head（iris）
 sqldf（select * from iris where lower（Species）像'％nica％'）

在dplyr帮助我无法找到我能做什么它。如下所示：

 过滤器（虹膜，物种像'％something％'）

开始和结束％非常重要。另外，请注意，数据帧有800 + k行，所以传统的R函数可能运行缓慢。它必须是一个基于dplyr的解决方案。

解决方案

这个 -

数据（虹膜）
过滤器（iris，grepl（nica，Species））

编辑：另一个选项 - 中的％<％c $ c> data.table（）

  library（dplyr）
 data（iris）
 ## 
 Iris<  -  iris [
 rep（seq_len（nrow（iris）），each = 5000），
] 
 dim（Iris）
 [1] 750000 5 
 ## 
库（微基准）
库（data.table）
 ## 
 Dt < -  data.table（Iris）
 setkeyv（Dt，cols =Species）
 ## 
 foo<  -  function（）{
 subI<  -  filter（Iris，grepl（nica ）
} 
 ## 
 foo2<  -  function（）{
 subI < -  Dt [物种％like％nica] 
} 
 ## 
 foo3<  -  function（）{
 subI<  -  filter（Iris，Species％like％nica）
} 
 Res<  -  microbenchmark （
 foo（），foo2（），foo3（），
 times = 100L）
 ## 
> Res 
单位：毫秒
 expr最小lq中位数uq max neval 
 foo（）114.31080 122.12303 131.15523 136.33254 214.0405 100 
 foo2（）23.00508 30.33685 39.77843 41.49121 129.9125 100 
 foo3 （）18.84933 22.47958 29.39228 35.96649 114.4389 100

dplyr is the only package that can handle my 843k data.frame and query it in a fast way. I can filter fine using some math and equal criteria, however I need to implement a search for a concept.



I need something like this sqldf query
library(sqldf)
head(iris)
sqldf("select * from iris where lower(Species) like '%nica%'")
In dplyr help I was not able to find how I could do it. something like:
filter(iris,Species like '%something%')
The starting and ending % is very important. Also, note that the data frame has 800+k rows so traditional R functions may run slow. It has to bee a dplyr based solution.
 解决方案 
What about this - 
library(dplyr)
data(iris)
filter(iris, grepl("nica",Species))
EDIT: Another option - the function %like% in data.table()
library(dplyr)
data(iris)
##
Iris <- iris[
  rep(seq_len(nrow(iris)),each=5000),
  ]
dim(Iris)
[1] 750000      5
##
library(microbenchmark)
library(data.table)
##
Dt <- data.table(Iris)
setkeyv(Dt,cols="Species")
##
foo <- function(){
  subI <- filter(Iris, grepl("nica",Species))
}
##
foo2 <- function(){
  subI <- Dt[Species %like% "nica"]
}
##
foo3 <- function(){
  subI <- filter(Iris, Species %like% "nica")
}
Res <- microbenchmark(
  foo(),foo2(),foo3(),
  times=100L)
##
> Res
Unit: milliseconds
   expr       min        lq    median        uq      max neval
  foo() 114.31080 122.12303 131.15523 136.33254 214.0405   100
 foo2()  23.00508  30.33685  39.77843  41.49121 129.9125   100
 foo3()  18.84933  22.47958  29.39228  35.96649 114.4389   100


                        
这篇关于dplyr包：如何使用'％xyz％'SQL语法查询大数据帧？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

dplyr包：如何使用'％xyz％'SQL语法查询大数据帧？ [英] dplyr package: How can I query large data frame using like '%xyz%' SQL syntax?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

dplyr包：如何使用'％xyz％'SQL语法查询大数据帧？ [英] dplyr package: How can I query large data frame using like &#39;%xyz%&#39; SQL syntax?

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

dplyr包：如何使用'％xyz％'SQL语法查询大数据帧？ [英] dplyr package: How can I query large data frame using like '%xyz%' SQL syntax?

登录关闭