R:使用RODBC& SQL Server [英] R: Painfully slow read performance using RODBC & SQL Server

查看:220
本文介绍了R:使用RODBC& SQL Server的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R的新手,但我有兴趣使用Shiny使用存储在SQL Server数据库中的数据创建动态图表。为了实现交互性,我想从数据库中导入原始数据,并在 R 中执行计算,而不是让数据库汇总数据。



我可以使用 RODBC 连接到数据库,执行查询,并在 data.frame 。但是, R 中的读取时间比在SQL Server Management Studio(SSMS)中执行的相同查询的时间长大约12倍。 SSMS花费大约600ms,而 R 花费大约7.6秒。我的问题是,我做错了,或 R 只是真的很慢与数据库访问?如果是,是否有更快的选择(例如将数据库输出写入文件并读取文件)?



有关查询的一些信息可能有帮助:约250K行,4列。第一列是日期,其他三个是数值。运行 R 和SSMS的机器是具有32GB内存的高端Win 7工作站。我正在运行的 R 命令是:

  system.time df < -  sqlQuery(cn,query))

返回:

 用户系统已过
7.17 0.01 7.58


有趣的是,似乎从SQL到我的机器的数据传输速度很快,但是 R 在内部忙了几秒钟,然后返回 data.frame 。我看到这是因为网络利用率峰值在第一秒,几乎立即返回到接近0.然后几秒钟后, R data.frame 返回。

解决方案

我会尝试RJDBC
http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf

使用这些驱动程序 https:/ /msdn.microsoft.com/en-us/sqlserver/aa937724.aspx

 图书馆(RJDBC)
drv< - JDBC(com.microsoft.sqlserver.jdbc.SQLServerDriver,/ sqljdbc4.jar)
con< - dbConnect(drv,jdbc:sqlserver://server.location ,username,password)
dbGetQuery(con,select table_name from table)


I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within R rather than have the database summarize the data.

I am able to connect to the database using RODBC, execute a query, and receive results in a data.frame. However, the read time in R is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereas R takes about 7.6 seconds. My question is whether I am doing something wrong, or is R just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?

Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running R and SSMS is a high-end Win 7 workstation with 32GB of memory. The R command that I am running is:

system.time(df <- sqlQuery(cn, query))

which returns:

user  system elapsed
7.17   0.01   7.58

Interestingly, it appears that the data transfer from SQL to my machine is fast, but that R is busy doing things internally for several seconds before returning the data.frame. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, the R data.frame returns.

解决方案

I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf

with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx

library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar") 
con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password")
dbGetQuery(con, "select column_name from table")

这篇关于R:使用RODBC&amp; SQL Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆