如何在Elasticsearch/painless中将二进制数据转换回float数组 [英] How to convert binary data back to a float array in Elasticsearch/painless

查看:552
本文介绍了如何在Elasticsearch/painless中将二进制数据转换回float数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试有效地存储和检索Elasticsearch 6.7中的浮点数组. Numeroc doc值已排序,这意味着我不能直接使用它们.

I am trying to efficiently store and retrieve an array of floats in elasticsearch 6.7. Numeroc doc values are sorted, which means I can't use them directly.

起初,我使用的是字段的source值,但是在大型查询中的性能并不理想.

At first I was using the source value of the field, but the performance on a large query is not great.

我试图将float数组编码为二进制并在脚本中对其进行解码.不幸的是,我一直坚持将byte[4]数组转换为painless中的float.

I tried to encode the float array as binary and decode it inside my script. Unfortunately I'm stuck at converting a byte[4] array to a float in painless.

在Java中,它看起来像这样

In Java this would look like this

Float.intBitsToFloat((vector_bytes[3] << 24) | ((vector_bytes[2] & 0xff) << 16) |  ((vector_bytes[1] & 0xff) << 8) |  (vector_bytes[0] & 0xff));

但是用& 0xff丢弃符号会毫无困难地抛出"Illegal tree structure.".

But discarding the sign with & 0xff throws a "Illegal tree structure." in painless.

关于如何执行此操作的任何想法?

Any idea on how to do this?

# Minimal example binary array
# Create the index
PUT binary_array 
{
  "mappings" : {
      "_doc" : {
          "properties" : {
              "vector_bin": { "type" : "binary", "doc_values": true },
              "vector": { "type" : "float" }
          }
      }
  }
}
# Put two documents
PUT binary_array/_doc/1
{
  "vector": [1.0, 1.1, 1.2],
  "vector_bin": "AACAP83MjD+amZk/"
}
PUT binary_array/_doc/2
{
  "vector": [3.0, 2.1, 1.2],
  "vector_bin": "AABAQGZmBkCamZk/"
}

示例搜索以将二进制数组转换回数组

GET binary_array/_search
{
  "script_fields": {
    "vector_parsed": {
      "script": {
        "source": """
        def vector_bytes = doc["vector_bin"].value.bytes;
        def vector = new float[vector_bytes.length/4];
        for (int i = 0; i < vector.length; ++i) {
          def n = i*4;
          // This would be the Java way, discarding the sign of bytes 0-2, but is raises a "Illegal tree structure." in painless
          //def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 0xff) << 16) |  ((vector_bytes[n+1] & 0xff) << 8) |  (vector_bytes[n] & 0xff);
          // This runs but gives incorrect results
          def intBits = (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] ) << 16) |  ((vector_bytes[n+1] ) << 8) |  (vector_bytes[n] );
          vector[i] = Float.intBitsToFloat( intBits );
        }
        return vector;
        """
      }
    },
    "vector_src": {
      "script": """params._source["vector"]"""
    }
  }
}

推荐答案

经过进一步的研究,我意识到按位运算确实可以在painless中使用,但是0xff无效.

After some more investigation I realized that the bitwise and does work in painless, but the 0xff doesn't.

这解决了我的问题:

Float.intBitsToFloat( (vector_bytes[n+3] << 24) | ((vector_bytes[n+2] & 255) << 16) |  ((vector_bytes[n+1] & 255) << 8) |  (vector_bytes[n] & 255) )

这篇关于如何在Elasticsearch/painless中将二进制数据转换回float数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆