- kdb:将R数据作为二进制对象传递给kdb + [英] R -> kdb: Pass R data to kdb+ as binary objects
问题描述
将R
对象(更具体地说,将时间序列表示为xts
或data.table
对象,即基于时间的列和数字列)插入到kdb+
数据库中的最有效方法是什么?
What's the most efficient way to insert R
objects (more specifically, time series expressed as xts
or data.table
objects, i.e. time-based and numeric columns) into a kdb+
database?
我只能通过q
表达式找到涉及字符串序列化的解决方案,如此处.>
I was able to locate only solution involving string serialization via q
expressions as described here and here.
推荐答案
我的解决方案受此版本的启发 来自github的qserver.c
My solution was inspired by this version of qserver.c from github
Yang添加了两个功能:convert_binary,convert_r,用于对数据进行反序列化,这基本上就是您所要求的.但是,返回值是一个十六进制数组.要与现有的execute函数合并,我们需要使用 paste(collapse =")转换为字符串,然后使用 sprintf 执行.下面是示例,它将R中的robj发送到kdb中的d:
Yang added two functions: convert_binary, convert_r that [de]serialized data, which is basically what you asked for. However, the return value is a hexadecimal array. To incorporate with existing execute function, we need to use paste(collapse="") to convert into a string, then use sprintf to execute. The following is the example, which will send robj in R to d in kdb:
execute(h, sprintf("d:-9!0x%s",paste(convert_r(robj),collapse="")))
问题在于,如果数组很大,则paste(collapse =")会花费一些时间.
The problem is that paste(collapse="") takes quite some time if the array is large.
robj是r对象.例如我尝试使用data.frame(dim = 60,000x100). convert_r()花费了< 0.5秒转换; paste(collapse =")花了13秒将其转换为单个字符串,然后 execute(h,...)花了< 1s传输数据.
robj is the r object. e.g. I tried it with a data.frame (dim = 60,000x100). convert_r() took < 0.5s to convert; paste(collapse="") took 13s to transform into a single string, then execute(h, ...) took < 1s to transfer the data.
我还没有找到任何人编写过通过串行二进制数据将R Data发送到kdb的函数(我不知道为什么),所以我自己做了一个.这是代码:
I have not found anyone who has written a function sending R Data to kdb via serialized binary data (I don't know why), so I made one myself. Here is the code:
SEXP kx_r_send_data(SEXP connection, SEXP robj, SEXP varname)
{
K result, conversion, serialized;
kx_connection = INTEGER_VALUE(connection);
conversion = from_any_robject(robj);
serialized = b9(2, conversion);
result = k(kx_connection, "{[d;v] v set -9!d;}", r1(serialized), ks((S)CHARACTER_VALUE(varname)), (K)0);
SEXP s = from_any_kobject(result);
r0(result);
r0(conversion);
r0(serialized);
return s;
}
我假设您具有修改qserver.c并重新编译qserver.o的知识 然后在qserver.R中添加一个函数:
I assume you have the knowledge to modify the qserver.c and recompile qserver.o Then you add a function in qserver.R:
send_data <- function(connection, r_obj, varname) {
.Call("kx_r_send_data", as.integer(connection), r_obj, varname)
}
这是在C级通过串行二进制将R Data发送到kdb的真实方法.
That is the true way of sending R Data to kdb via serialized binary at C level.
注意:
1)转换不适用于 data.table ,因为它不是标准的R类.用data.table调用该函数会导致分段错误.
1) the conversion doesn't work with data.table as it's not a standard R class. Calling the function with data.table will lead to segmentation fault.
2)序列化不知道如何转换对象的 date/datetime 类型.传输到kdb后,序列化将全部变为 0N .
2) Serialization doesn't know how to convert date/datetime type of object. Serialization will make it all 0N after transfer into kdb.
除非您要实现从R到K的date/datetime/data.table转换,否则请勿调用 convert_r()或 send_data()这些类型的函数.
Unless you want to implement the date/datetime/data.table conversion from R to K, Do NOT call convert_r() or send_data() functions for those types.
另一方面,有一个快速的解决方法.对于data.table,只需在调用函数之前使用 as.data.frame 将其转换为data.frame类. 对于date/datetime类,在发送到kdb之前,先使用 as.character()转换为字符串.然后直接在KDB内部转换为"D"或"P".
On the other hand, there is a quick workaround. For data.table, simply use as.data.frame to convert it to data.frame class before calling the functions. For date/datetime class, use as.character() to convert into string before sending to kdb. Then cast to "D" or "P" inside KDB directly.
3)序列化data.frame包括其他信息,例如行,行名,类信息等.传输后,您需要在kdb中操作数据.
3) serializing data.frame includes other information such as rows, row name, class info, etc. You need to manipulate the data inside kdb after the transfer.
我建议编写一个处理那些异常情况的R包装函数,然后调用 send_data()将数据传递给kdb.然后使用 execute(h,...)将数据处理为kdb中的标准格式.
I would suggest writing an R wrapper function that handles those abnormal cases, then call send_data() to pass the data to kdb. Then use execute(h, ...) to manipulate the data into a standard format inside kdb.
相同的数据(60,000x100)现在需要< 1s结束,从R到kdb端到端.
The same data (60,000x100) now takes < 1s to finish, end-to-end from R to kdb.
PS>我可能在代码内有错字,因为我不知道如何在此处粘贴漂亮的代码.我实际上输入了它.让我知道您是否在代码中发现任何严重的错字
这篇关于- kdb:将R数据作为二进制对象传递给kdb +的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!