R:使用RODBC& SQL Server [英] R: Painfully slow read performance using RODBC & SQL Server
问题描述
我是R的新手,但我有兴趣使用Shiny使用存储在SQL Server数据库中的数据创建动态图表。为了实现交互性,我想从数据库中导入原始数据,并在 R
中执行计算,而不是让数据库汇总数据。
我可以使用 RODBC
连接到数据库,执行查询,并在 data.frame
。但是, R
中的读取时间比在SQL Server Management Studio(SSMS)中执行的相同查询的时间长大约12倍。 SSMS花费大约600ms,而 R
花费大约7.6秒。我的问题是,我做错了,或 R
只是真的很慢与数据库访问?如果是,是否有更快的选择(例如将数据库输出写入文件并读取文件)?
有关查询的一些信息可能有帮助:约250K行,4列。第一列是日期,其他三个是数值。运行 R
和SSMS的机器是具有32GB内存的高端Win 7工作站。我正在运行的 R
命令是:
system.time df < - sqlQuery(cn,query))
返回:
用户系统已过
7.17 0.01 7.58
有趣的是,似乎从SQL到我的机器的数据传输速度很快,但是 R
在内部忙了几秒钟,然后返回 data.frame
。我看到这是因为网络利用率峰值在第一秒,几乎立即返回到接近0.然后几秒钟后, R
data.frame
返回。
我会尝试RJDBC
http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf
使用这些驱动程序 https:/ /msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
图书馆(RJDBC)
drv< - JDBC(com.microsoft.sqlserver.jdbc.SQLServerDriver,/ sqljdbc4.jar)
con< - dbConnect(drv,jdbc:sqlserver://server.location ,username,password)
dbGetQuery(con,select table_name from table)
I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within R
rather than have the database summarize the data.
I am able to connect to the database using RODBC
, execute a query, and receive results in a data.frame
. However, the read time in R
is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereas R
takes about 7.6 seconds. My question is whether I am doing something wrong, or is R
just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?
Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running R
and SSMS is a high-end Win 7 workstation with 32GB of memory. The R
command that I am running is:
system.time(df <- sqlQuery(cn, query))
which returns:
user system elapsed
7.17 0.01 7.58
Interestingly, it appears that the data transfer from SQL to my machine is fast, but that R
is busy doing things internally for several seconds before returning the data.frame
. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, the R
data.frame
returns.
I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf
with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar")
con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password")
dbGetQuery(con, "select column_name from table")
这篇关于R:使用RODBC& SQL Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!