如何将数据帧从R发送到Q / KDB? [英] How to send a data.frame from R to Q/KDB?
问题描述
data.frame
(15列和100,000行)。从 KDB的食谱,可能的解决方案是: RServer for Q :使用KDB创建共享内存空间的新R实例。这不起作用,因为我的数据位于R的现有实例中。
RServe :运行R服务器,并使用TCP / IP与Q / KDB客户端进行通信。这不起作用,因为根据 RServe的文档,每个连接都有一个单独的工作区和工作目录,所以我假设没有看到我现有的数据。
R Math Library :通过数学库访问R的功能,而不需要R的实例。这不起作用,因为我的数据是已经在R的一个实例中。
所以有关如何将数据从R发送到Q / KDB的任何其他想法?
在Q中打开一个端口。我用批处理文件启动Q:
@echo off
c:\q\w32\q -p 5001
加载qserver.dll
tryCatch({
dyn.load(c:/q/qserver.dll) }
,error = function(f){
print(can not load qserver.dll)
})
然后使用这些
open_connection< - function(host =localhost,port = 5001,user = NULL){
pre>
参数< - list(host,as.integer(port),user)
h < - .Call(kx_r_open_connection,参数)
assign(。kh,h,envir = .GlobalEnv)
return(h)
}
close_connection < - function(connection){
.Call(kx_r_close_connection,as.integer(connection))
}
execute< - function(connection,query)
.Call(kx_r_execute,as.integer(connection),query)
}
d< -open_connection(host =localhost,port = thePort)
ex2< - function(...)
{
查询< - list(...)
theResult < - NULL
for(i in query)theResult< - paste0(theResult,i)
return(execute(d,paste0(theResult)))
}
然后ex2可以采用多个参数,以便您可以使用R变量构建查询nd string
编辑:来自Q的R,继承R到Q
第二编辑:改进的算法:
library(stringr)
RToQTable< - function(Rtable,Qname,withColNames = TRUE,withRowNames = TRUE, colSuffix = NULL)
{
theColnames< - if(!withColNames ||长度(colnames(Rtable))== 0)paste0(col,as.character(1:length(Rtable [1,])),colSuffix)else colnames(Rtable)
if(!withRowNames ||长度(rownames(Rtable))== 0)withRowNames< - FALSE
Rtable < - rbind(Rtable,linesep)
charnum< - as.integer(nchar(thestr&粘贴(paste0(theColnames,':(',str_split(paste(Rtable,collapse =';'),'; \linesep\; \')[[1]],') ;'),= =)) - 11)
如果(withRowNames)
ex2(Qname,:([],Qname,str_replace_all(paste0( R),,_),;,内部(substr(thestr,1L,charnum)),)))else
ex2(Qname, :([],。内部(substr(thestr,1L,charnum)),)))
}
> bigMat< - matrix(runif(1500000),nrow = 100000,ncol = 15)
>微基准(RToQTable(bigMat,Qmat),times = 3)
单位:秒
expr min lq平均值uq max neval
RToQTable(bigMat,Qmat)10.29171 10.315 10.32766 10.33829 10.34563 10.35298 3
这将适用于矩阵,所以对于数据框只需保存一个包含每个列的类型,然后将数据框转换为矩阵,将矩阵导入Q,并转换类型
请注意,此算法约为O(rows * cols ^ 1.1),所以如果你有超过20个O(行* cols)
,那么你需要把这些列分成多个子集,但是对于你的例子,15万行和15列需要10秒,因此可能不需要进一步优化。
I have a large
data.frame
(15 columns and 100,000 rows) in an existing R session that I want to send to a Q/KDB instance. From KDB's cookbook, the possible solutions are:RServer for Q: use KDB to create new R instance which shares memory space. This doesn't work because my data is in an existing instance of R.
RServe: run an R server and use TCP/IP to communicate with Q/KDB client. This does not work, because as per RServe's documentation, "every connection has a separate workspace and working directory" and so i presume does not see my existing data.
R Math Library: access R's functionality via a math library without needing an instance of R. This does not work because my data is already in an instance of R.
So any other ideas on how to send data from R to Q/KDB?
解决方案open a port in Q. I start Q with a batch file:
@echo off c:\q\w32\q -p 5001
load qserver.dll
tryCatch({ dyn.load("c:/q/qserver.dll")} ,error = function(f){ print("can't load qserver.dll") })
Then use these
open_connection <- function(host="localhost", port=5001, user=NULL) { parameters <- list(host, as.integer(port), user) h <- .Call("kx_r_open_connection", parameters) assign(".k.h", h, envir = .GlobalEnv) return(h) } close_connection <- function(connection) { .Call("kx_r_close_connection", as.integer(connection)) } execute <- function(connection, query) { .Call("kx_r_execute", as.integer(connection), query) } d<<-open_connection(host="localhost",port=thePort) ex2 <- function(...) { query <- list(...) theResult <- NULL for(i in query) theResult <- paste0(theResult,i) return(execute(d,paste0(theResult))) }
then ex2 can take multiple arguments so you can build queries with R variables and strings
Edit: thats for R from Q, heres R to Q
2nd Edit: improved algo:
library(stringr) RToQTable <- function(Rtable,Qname,withColNames=TRUE,withRowNames=TRUE,colSuffix = NULL) { theColnames <- if(!withColNames || length(colnames(Rtable))==0) paste0("col",as.character(1:length(Rtable[1,])),colSuffix) else colnames(Rtable) if(!withRowNames || length(rownames(Rtable))==0) withRowNames <- FALSE Rtable <- rbind(Rtable,"linesep") charnum <- as.integer(nchar(thestr <- paste(paste0(theColnames,':("',str_split(paste(Rtable,collapse='";"'),';\"linesep\";\"')[[1]],');'),collapse="")) - 11) if(withRowNames) ex2(Qname,":([]",Qname,str_replace_all(paste0("`",paste(rownames(Rtable),collapse="`"))," ","_"),";",.Internal(substr(thestr,1L,charnum)),"))") else ex2(Qname,":([]",.Internal(substr(thestr,1L,charnum)),"))") } > bigMat <- matrix(runif(1500000),nrow=100000,ncol=15) > microbenchmark(RToQTable(bigMat,"Qmat"),times=3) Unit: seconds expr min lq mean median uq max neval RToQTable(bigMat, "Qmat") 10.29171 10.315 10.32766 10.33829 10.34563 10.35298 3
This will work for a matrix, so for a data frame just save a vector containing the types of each column, then convert the dataframe to a matrix, import the matrix to Q, and cast the types
Note that this algo is approx O(rows * cols^1.1) so you'll need to chop the columns up into multiple matricies if you have any more than 20 to get O(rows * cols)
but for your example 150,000 rows and 15 columns takes 10 seconds so further optimization may not be necessary.
这篇关于如何将数据帧从R发送到Q / KDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!