如何在Elasticsearch/painless中将二进制数据转换回float数组 [英] How to convert binary data back to a float array in Elasticsearch/painless
问题描述
我正在尝试有效地存储和检索Elasticsearch 6.7中的浮点数组. Numeroc doc值已排序,这意味着我不能直接使用它们.
I am trying to efficiently store and retrieve an array of floats in elasticsearch 6.7. Numeroc doc values are sorted, which means I can't use them directly.
起初,我使用的是字段的source
值,但是在大型查询中的性能并不理想.
At first I was using the source
value of the field, but the performance on a large query is not great.
我试图将float数组编码为二进制并在脚本中对其进行解码.不幸的是,我一直坚持将byte[4]
数组转换为painless
中的float
.
I tried to encode the float array as binary and decode it inside my script. Unfortunately I'm stuck at converting a byte[4]
array to a float
in painless
.
在Java中,它看起来像这样
In Java this would look like this
Float.intBitsToFloat((vector_bytes[3] << 24) | ((vector_bytes[2] & 0xff) << 16) | ((vector_bytes[1] & 0xff) << 8) | (vector_bytes[0] & 0xff));
但是用& 0xff
丢弃符号会毫无困难地抛出"Illegal tree structure."
.
But discarding the sign with & 0xff
throws a "Illegal tree structure."
in painless.
关于如何执行此操作的任何想法?
Any idea on how to do this?
# Minimal example binary array
# Create the index
PUT binary_array
{
"mappings" : {
"_doc" : {
"properties" : {
"vector_bin": { "type" : "binary", "doc_values": true },
"vector": { "type" : "float" }
}
}
}
}
# Put two documents
PUT binary_array/_doc/1
{
"vector": [1.0, 1.1, 1.2],
"vector_bin": "AACAP83MjD+amZk/"
}
PUT binary_array/_doc/2
{
"vector": [3.0, 2.1, 1.2],
"vector_bin": "AABAQGZmBkCamZk/"
}
示例搜索以将二进制数组转换回数组
GET binary_array/_search
{
"script_fields": {
"vector_parsed": {
"script": {
"source": """
def vector_bytes = doc["vector_bin"].value.bytes;
def vector = new float[vector_bytes.length/4];
for (int i = 0; i < vector.length; ++i) {
def n = i*4;
// This would be the Java way, discarding the sign of bytes 0-2, but is raises a "Illegal tree structure." in painless
//def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 0xff) << 16) | ((vector_bytes[n+1] & 0xff) << 8) | (vector_bytes[n] & 0xff);
// This runs but gives incorrect results
def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] ) << 16) | ((vector_bytes[n+1] ) << 8) | (vector_bytes[n] );
vector[i] = Float.intBitsToFloat( intBits );
}
return vector;
"""
}
},
"vector_src": {
"script": """params._source["vector"]"""
}
}
}
推荐答案
经过进一步的研究,我意识到按位运算确实可以在painless
中使用,但是0xff
无效.
After some more investigation I realized that the bitwise and does work in painless
, but the 0xff
doesn't.
这解决了我的问题:
Float.intBitsToFloat( (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 255) << 16) | ((vector_bytes[n+1] & 255) << 8) | (vector_bytes[n] & 255) )
这篇关于如何在Elasticsearch/painless中将二进制数据转换回float数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!