如何将数据帧从R发送到Q / KDB? [英] How to send a data.frame from R to Q/KDB?

查看:194
本文介绍了如何将数据帧从R发送到Q / KDB?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我想发送到Q / KDB实例的现有R会话中,我有一个较大的 data.frame (15列和100,000行)。从 KDB的食谱,可能的解决方案是:



RServer for Q 使用KDB创建共享内存空间的新R实例。这不起作用,因为我的数据位于R的现有实例中。



RServe 运行R服务器,并使用TCP / IP与Q / KDB客户端进行通信。这不起作用,因为根据 RServe的文档,每个连接都有一个单独的工作区和工作目录,所以我假设没有看到我现有的数据。



R Math Library 通过数学库访问R的功能,而不需要R的实例。这不起作用,因为我的数据是已经在R的一个实例中。



所以有关如何将数据从R发送到Q / KDB的任何其他想法?

解决方案

在Q中打开一个端口。我用批处理文件启动Q:

  @echo off 
c:\q\w32\q -p 5001

加载qserver.dll

  tryCatch({
dyn.load(c:/q/qserver.dll) }
,error = function(f){
print(can not load qserver.dll)
})

然后使用这些

  open_connection<  -  function(host =localhost,port = 5001,user = NULL){
参数< - list(host,as.integer(port),user)
h < - .Call(kx_r_open_connection,参数)
assign(。kh,h,envir = .GlobalEnv)
return(h)
}

close_connection < - function(connection){
.Call(kx_r_close_connection,as.integer(connection))
}

execute< - function(connection,query)
.Call(kx_r_execute,as.integer(connection),query)
}

d< -open_connection(host =localhost,port = thePort)

ex2< - function(...)
{
查询< - list(...)
theResult < - NULL
for(i in query)theResult< - paste0(theResult,i)
return(execute(d,paste0(theResult)))
}
pre>

然后ex2可以采用多个参数,以便您可以使用R变量构建查询nd string



编辑:来自Q的R,继承R到Q



第二编辑:改进的算法:

  library(stringr)
RToQTable< - function(Rtable,Qname,withColNames = TRUE,withRowNames = TRUE, colSuffix = NULL)
{
theColnames< - if(!withColNames ||长度(colnames(Rtable))== 0)paste0(col,as.character(1:length(Rtable [1,])),colSuffix)else colnames(Rtable)
if(!withRowNames ||长度(rownames(Rtable))== 0)withRowNames< - FALSE
Rtable < - rbind(Rtable,linesep)
charnum< - as.integer(nchar(thestr&粘贴(paste0(theColnames,':(',str_split(paste(Rtable,collapse =';'),'; \linesep\; \')[[1]],') ;'),= =)) - 11)
如果(withRowNames)
ex2(Qname,:([],Qname,str_replace_all(paste0( R),,_),;,内部(substr(thestr,1L,charnum)),)))else
ex2(Qname, :([],。内部(substr(thestr,1L,charnum)),)))
}

> bigMat< - matrix(runif(1500000),nrow = 100000,ncol = 15)
>微基准(RToQTable(bigMat,Qmat),times = 3)
单位:秒
expr min lq平均值uq max neval
RToQTable(bigMat,Qmat)10.29171 10.315 10.32766 10.33829 10.34563 10.35298 3

这将适用于矩阵,所以对于数据框只需保存一个包含每个列的类型,然后将数据框转换为矩阵,将矩阵导入Q,并转换类型



请注意,此算法约为O(rows * cols ^ 1.1),所以如果你有超过20个O(行* cols)



,那么你需要把这些列分成多个子集,但是对于你的例子,15万行和15列需要10秒,因此可能不需要进一步优化。


I have a large data.frame (15 columns and 100,000 rows) in an existing R session that I want to send to a Q/KDB instance. From KDB's cookbook, the possible solutions are:

RServer for Q: use KDB to create new R instance which shares memory space. This doesn't work because my data is in an existing instance of R.

RServe: run an R server and use TCP/IP to communicate with Q/KDB client. This does not work, because as per RServe's documentation, "every connection has a separate workspace and working directory" and so i presume does not see my existing data.

R Math Library: access R's functionality via a math library without needing an instance of R. This does not work because my data is already in an instance of R.

So any other ideas on how to send data from R to Q/KDB?

解决方案

open a port in Q. I start Q with a batch file:

@echo off
c:\q\w32\q -p 5001

load qserver.dll

tryCatch({
dyn.load("c:/q/qserver.dll")}
  ,error = function(f){
    print("can't load qserver.dll")
  })

Then use these

open_connection <- function(host="localhost", port=5001, user=NULL) {
         parameters <- list(host, as.integer(port), user)
      h <- .Call("kx_r_open_connection", parameters)
    assign(".k.h", h, envir = .GlobalEnv)
    return(h)
}

close_connection <- function(connection) {
         .Call("kx_r_close_connection", as.integer(connection))
}

execute <- function(connection, query) {
         .Call("kx_r_execute", as.integer(connection), query)
}

 d<<-open_connection(host="localhost",port=thePort)

ex2 <- function(...) 
{
  query <- list(...)
  theResult <- NULL
  for(i in query) theResult <- paste0(theResult,i)
  return(execute(d,paste0(theResult)))
}

then ex2 can take multiple arguments so you can build queries with R variables and strings

Edit: thats for R from Q, heres R to Q

2nd Edit: improved algo:

library(stringr)
  RToQTable <- function(Rtable,Qname,withColNames=TRUE,withRowNames=TRUE,colSuffix = NULL)
{
  theColnames <- if(!withColNames || length(colnames(Rtable))==0) paste0("col",as.character(1:length(Rtable[1,])),colSuffix) else colnames(Rtable)
  if(!withRowNames || length(rownames(Rtable))==0) withRowNames <- FALSE
  Rtable <- rbind(Rtable,"linesep")
  charnum <- as.integer(nchar(thestr <- paste(paste0(theColnames,':("',str_split(paste(Rtable,collapse='";"'),';\"linesep\";\"')[[1]],');'),collapse="")) - 11)
  if(withRowNames)
    ex2(Qname,":([]",Qname,str_replace_all(paste0("`",paste(rownames(Rtable),collapse="`"))," ","_"),";",.Internal(substr(thestr,1L,charnum)),"))") else
    ex2(Qname,":([]",.Internal(substr(thestr,1L,charnum)),"))")
}

> bigMat <- matrix(runif(1500000),nrow=100000,ncol=15)
> microbenchmark(RToQTable(bigMat,"Qmat"),times=3)
Unit: seconds
                      expr      min     lq     mean   median       uq      max neval
 RToQTable(bigMat, "Qmat") 10.29171 10.315 10.32766 10.33829 10.34563 10.35298     3

This will work for a matrix, so for a data frame just save a vector containing the types of each column, then convert the dataframe to a matrix, import the matrix to Q, and cast the types

Note that this algo is approx O(rows * cols^1.1) so you'll need to chop the columns up into multiple matricies if you have any more than 20 to get O(rows * cols)

but for your example 150,000 rows and 15 columns takes 10 seconds so further optimization may not be necessary.

这篇关于如何将数据帧从R发送到Q / KDB?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆