- kdb:将R数据作为二进制对象传递给kdb + [英] R -> kdb: Pass R data to kdb+ as binary objects

查看:73
本文介绍了- kdb:将R数据作为二进制对象传递给kdb +的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R对象(更具体地说,将时间序列表示为xtsdata.table对象,即基于时间的列和数字列)插入到kdb+数据库中的最有效方法是什么?

What's the most efficient way to insert R objects (more specifically, time series expressed as xts or data.table objects, i.e. time-based and numeric columns) into a kdb+ database?

我只能通过q表达式找到涉及字符串序列化的解决方案,如此处.

I was able to locate only solution involving string serialization via q expressions as described here and here.

推荐答案

我的解决方案受此版本的启发 来自github的qserver.c

My solution was inspired by this version of qserver.c from github

Yang添加了两个功能:convert_binary,convert_r,用于对数据进行反序列化,这基本上就是您所要求的.但是,返回值是一个十六进制数组.要与现有的execute函数合并,我们需要使用 paste(collapse =")转换为字符串,然后使用 sprintf 执行.下面是示例,它将R中的robj发送到kdb中的d:

Yang added two functions: convert_binary, convert_r that [de]serialized data, which is basically what you asked for. However, the return value is a hexadecimal array. To incorporate with existing execute function, we need to use paste(collapse="") to convert into a string, then use sprintf to execute. The following is the example, which will send robj in R to d in kdb:

execute(h, sprintf("d:-9!0x%s",paste(convert_r(robj),collapse="")))

问题在于,如果数组很大,则paste(collapse =")会花费一些时间.

The problem is that paste(collapse="") takes quite some time if the array is large.

robj是r对象.例如我尝试使用data.frame(dim = 60,000x100). convert_r()花费了< 0.5秒转换; paste(collapse =")花了13秒将其转换为单个字符串,然后 execute(h,...)花了< 1s传输数据.

robj is the r object. e.g. I tried it with a data.frame (dim = 60,000x100). convert_r() took < 0.5s to convert; paste(collapse="") took 13s to transform into a single string, then execute(h, ...) took < 1s to transfer the data.

我还没有找到任何人编写过通过串行二进制数据将R Data发送到kdb的函数(我不知道为什么),所以我自己做了一个.这是代码:

I have not found anyone who has written a function sending R Data to kdb via serialized binary data (I don't know why), so I made one myself. Here is the code:

SEXP kx_r_send_data(SEXP connection, SEXP robj, SEXP varname)
{
  K result, conversion, serialized;
  kx_connection = INTEGER_VALUE(connection);
  conversion = from_any_robject(robj);
  serialized = b9(2, conversion);
  result = k(kx_connection, "{[d;v] v set -9!d;}", r1(serialized), ks((S)CHARACTER_VALUE(varname)), (K)0);
  SEXP s = from_any_kobject(result);
  r0(result);
  r0(conversion);
  r0(serialized);
  return s;
}

我假设您具有修改qserver.c并重新编译qserver.o的知识 然后在qserver.R中添加一个函数:

I assume you have the knowledge to modify the qserver.c and recompile qserver.o Then you add a function in qserver.R:

send_data <- function(connection, r_obj, varname) {
  .Call("kx_r_send_data", as.integer(connection), r_obj, varname)
}

这是在C级通过串行二进制将R Data发送到kdb的真实方法.

That is the true way of sending R Data to kdb via serialized binary at C level.

注意:

1)转换不适用于 data.table ,因为它不是标准的R类.用data.table调用该函数会导致分段错误.

1) the conversion doesn't work with data.table as it's not a standard R class. Calling the function with data.table will lead to segmentation fault.

2)序列化不知道如何转换对象的 date/datetime 类型.传输到kdb后,序列化将全部变为 0N .

2) Serialization doesn't know how to convert date/datetime type of object. Serialization will make it all 0N after transfer into kdb.

除非您要实现从R到K的date/datetime/data.table转换,否则请勿调用 convert_r() send_data()这些类型的函数.

Unless you want to implement the date/datetime/data.table conversion from R to K, Do NOT call convert_r() or send_data() functions for those types.

另一方面,有一个快速的解决方法.对于data.table,只需在调用函数之前使用 as.data.frame 将其转换为data.frame类. 对于date/datetime类,在发送到kdb之前,先使用 as.character()转换为字符串.然后直接在KDB内部转换为"D"或"P".

On the other hand, there is a quick workaround. For data.table, simply use as.data.frame to convert it to data.frame class before calling the functions. For date/datetime class, use as.character() to convert into string before sending to kdb. Then cast to "D" or "P" inside KDB directly.

3)序列化data.frame包括其他信息,例如行,行名,类信息等.传输后,您需要在kdb中操作数据.

3) serializing data.frame includes other information such as rows, row name, class info, etc. You need to manipulate the data inside kdb after the transfer.

我建议编写一个处理那些异常情况的R包装函数,然后调用 send_data()将数据传递给kdb.然后使用 execute(h,...)将数据处理为kdb中的标准格式.

I would suggest writing an R wrapper function that handles those abnormal cases, then call send_data() to pass the data to kdb. Then use execute(h, ...) to manipulate the data into a standard format inside kdb.

相同的数据(60,000x100)现在需要< 1s结束,从R到kdb端到端.

The same data (60,000x100) now takes < 1s to finish, end-to-end from R to kdb.

PS>我可能在代码内有错字,因为我不知道如何在此处粘贴漂亮的代码.我实际上输入了它.让我知道您是否在代码中发现任何严重的错字

这篇关于- kdb:将R数据作为二进制对象传递给kdb +的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆