R中的sqldf包,查询数据帧 [英] sqldf package in R, querying a data frame

查看:113
本文介绍了R中的sqldf包,查询数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用R中的sqldf库重写一些代码,这应该允许我在数据帧上运行SQL查询,但是我遇到的一个问题是,每当我尝试运行查询时,R看起来像它尝试查询我使用的实际真实MySQL db con,并通过我要搜索的数据框名称查找表.

I'm trying to rewrite some code using the sqldf library in R, which should allow me to run SQL queries on data frames, but I am having an issue in that whenever I try to run a query, R seems like it tries to query the actual real MySQL db con that I use and look for a table by the name of a the data frame that I am trying to search by.

运行此命令时:

    sqldf("SELECT COUNT(*) from work.class_scores")

我得到:

mysqlNewConnection(drv,...)中的错误: RS-DBI驱动程序:(无法连接到数据库:错误:无法通过套接字"/tmp/mysql.sock"连接到本地MySQL服务器(2) )

Error in mysqlNewConnection(drv, ...) : RS-DBI driver: (Failed to connect to database: Error: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2) )

当我尝试使用两种不同的方式指定位置时(第一种形式是googlecode页面,第二种形式应该基于文档正确)

When I try to specify the location using two different ways (the first form the googlecode page, the second which should be right based on the docs)

>     sqldf("SELECT COUNT(*) from work.class_scores", sqldf.driver = "SQLite")
Error in sqldf("SELECT COUNT(*) from work.class_scores", sqldf.driver = "SQLite") : 
  unused argument(s) (sqldf.driver = "SQLite")
>     sqldf("SELECT COUNT(*) from work.class_scores", drv = "SQLite")
Loading required package: tcltk
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so':
  dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
  Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so
  Reason: image not found
Error: require(tcltk) is not TRUE

所以,我认为这个tcltk软件包可能是个问题,我从未听说过,所以我尝试着照顾一下并发现一些问题:

So, I'm thinking it might be a a problem with this package tcltk, which I have never heard of, so I try and take care of that and find some issues:

 > install.packages("tcltk")
Warning in install.packages :
  argument 'lib' is missing: using '/Users/michaeldiscenza/Library/R/2.11/library'
Warning in install.packages :
  package ‘tcltk’ is not available
> install.packages("tcltk2", lib="/Applications/RStudio.app/Contents/Resources/R/library")
trying URL 'http://lib.stat.cmu.edu/R/CRAN/bin/macosx/leopard/contrib/2.11/tcltk2_1.1-5.tgz'
Content type 'application/x-gzip' length 940835 bytes (918 Kb)
opened URL
==================================================
downloaded 918 Kb


The downloaded packages are in
    /var/folders/Y1/Y1gdz9tKFiSnWsGP9+BDcU+++TI/-Tmp-//RtmpL07KTL/downloaded_packages
> library("tcltk")
Loading Tcl/Tk interface ... Error : .onLoad failed in loadNamespace() for 'tcltk', details:
  call: dyn.load(file, DLLpath = DLLpath, ...)
  error: unable to load shared library '/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so':
  dlopen(/Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so, 10): Library not loaded: /usr/local/lib/libtcl8.5.dylib
  Referenced from: /Library/Frameworks/R.framework/Resources/library/tcltk/libs/x86_64/tcltk.so
  Reason: image not found
Error: package/namespace load failed for 'tcltk'

!dbPreExists错误:参数类型无效

Error in !dbPreExists : invalid argument type

在这里,我真的真的不知道问题出在哪里,我需要四处走走吗?

Here, I just really don't know what the issue is, do I need to move something around?

我尝试过的另一种方法是在数据框对象上运行查询之前,先设置数据库连接,以便R可以在那看,而不是尝试连接到实际的本地MySQL数据库.但这是行不通的.回到套接字的问题(即使我可以毫无问题地查询本地数据库本身.

Another approach that I tried was before running the query on the data frame object, setting my database connection so R would look there rather than trying to connect to the actual local MySQL database. But that didn't work. Back to the problem with the socket (even though I can query the local DB itself without any issues.

>     con <- sqldf()
Error in mysqlNewConnection(drv, ...) : 
  RS-DBI driver: (Failed to connect to database: Error: Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2)
)

最终,我想查询以获取例如C的值大于2的记录数,对此我感到很自在.唯一的问题是我不知道是否还有另一种方法可以指定我要查询的是数据帧而不是实际的数据库.我在这里错过了真正愚蠢又容易的事情吗?

Eventually, I want to query to get the count of records where the value for C is larger than 2 for example, and I feel comfortable doing that. The only problem is I don't know if there is another way to specify that what I am querying is a data frame and not an actual db. Am I missing something really silly and easy here?

谢谢!

推荐答案

此答案已从我之前的评论中转移过来.

This answer has been transferred from my earlier comments.

帖子和评论表明:

  1. 即使加载了RMySQL,也希望将SQLite与sqldf一起使用 和

  1. it is desired to use SQLite with sqldf even though RMySQL is loaded and

有一则有关tcltk丢失的消息

there was a message about tcltk being missing

存在以下问题:sqldf("select count(*) from work.class_scores") 其中work.class_scores是数据帧.

there was a problem regarding: sqldf("select count(*) from work.class_scores") where work.class_scores is a data frame.

sqldf主页上面的FAQ#7地址(1)和FAQ#5地址(2). (3)是由于点是SQL运算符,因此需要对此类数据框名称加引号,否则应更改其名称以删除点.

On the sqldf home page FAQ #7 addresses (1) above and FAQ #5 addresses (2). (3) is due to the fact that dot is an SQL operator so such data frame names need to be quoted or else their name changed to remove the dot.

下面,我们提供可重现的示例,该示例可实现上述三种解决方案.

Below we provide reproducible example that implements the above three solutions.

即使加载了RMySQL,sqldf.driver选项也用于强制使用SQLite.

The sqldf.driver option is used to force SQLite to be used even though RMySQL is loaded.

关于tcltk,有三种方法:(i)gsubfn.engine选项导致使用R代码代替tcltk,从而不需要tcltk软件包.请参见下面的示例代码. (ii)交替安装tcltk. (iii)当sqldf 0.4-4是当前版本时提出了这个问题,但是现在sqldf 0.4-5不在了,请注意,已经添加了附加的tcltk程序包检测,这使得它更有可能自动处理所有这些而无需用户设置任何选项,而不必安装tcltk.因此,最简单的解决方案可能是仅升级到sqldf 0.4-5或更高版本.

Regarding tcltk there are three approaches: (i) The gsubfn.engine option causes R code to be used in place of tcltk so that the tcltk package won't be needed. See example code below. (ii) Alternately install tcltk. (iii) This question was asked when sqldf 0.4-4 was the current version but now that sqldf 0.4-5 is out note that additional tcltk package detection has been added which makes it more likely that it will automatically handle all this without the user having to set any options and without having to install tcltk. Thus the easiest solution may be to just upgrade to sqldf 0.4-5 or later.

我们用引号引起来的数据框名称,或者用不包含点的名称替换数据框名称:

We quote the data frame name having a dot in it or replace the data frame name with a name not containing a dot:

options(sqldf.driver = "SQLite") # as per FAQ #7 force SQLite
options(gsubfn.engine = "R") # as per FAQ #5 use R code rather than tcltk

library(RMySQL)
library(sqldf)

work.class_scores <- BOD # BOD is built in
sqldf("select count(*) from 'work.class_scores'")

# or
work_class_scores <- work.class_scores
sqldf("select count(*) from work_class_scores")

添加了有关sqldf 0.4-5的信息.

Added info about sqldf 0.4-5.

这篇关于R中的sqldf包,查询数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆