not_analyzed字段与doc_values仍然在fielddata缓存 [英] not_analyzed field with doc_values still in fielddata cache

查看：285 发布时间：2017/8/7 1:19:21 mapping elasticsearch

本文介绍了not_analyzed字段与doc_values仍然在fielddata缓存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 fielddata vs doc_values ，我遇到了一个奇怪的情况。在我早期的映射中，根本没有使用doc值。在我的新映射中，除了分析字符串字段和布尔（ doc_values：true com / elastic / elasticsearch / issues / 7851rel =nofollow>不支持，直到2.0 ）。

所以在细节上，这是我如何进行：在重新索引所有数据之前，我重新启动了我的ES 1.7集群，并运行了一个带有排序，聚合和脚本字段的查询，以加热fielddata缓存。然后我查询了 / fielddata 端点，以了解fielddata缓存使用情况。它看起来像这样：

  curl -XGET'localhost：9200 / _cat / fielddata？v& fields = *'
 
 id主机ip节点总items.desc.raw more_fields ... 
 rKX7 ... myhost 192.168.1.100 Doom 32.9mb 2.3mb ...

如你所见，字段 items.desc.raw 使用2.3mb的堆空间。 项目的类型为嵌套，并包含一个字符串多字段，一个 not_analyzed 子字段称为 raw 。简而言之，该嵌套字段的映射如下所示：

 items：{
type： nested，
properties：{
desc：{
type：string，
fields：{
raw ：{
type：string，
index：not_analyzed
} 
} 
} 
} 
}

添加 doc_values：true 至 items.desc.raw ，重新索引整个索引并运行一些聚合，再次排序和编写脚本以加快fieldData缓存，我查询了 / fielddata 端点，结果如下：

  curl -XGET'localhost：9200 / _cat / fielddata？v& ; fields = *'
 
 id主机ip节点总items.desc.raw some_bools ... 
 tAB5 ... myhost 192.168.1.100 Yack 2.1mb 9.2kb ...

所以现场数据的使用确实已经大大降低（这是很好的），我看到的唯一的领域是布尔字段 some_bools 以上），但令人惊讶的是，我的嵌套 not_analyzed 字符串字段也出现了，但是有一个很多较低的空间使用率。

 
 
 可能是因为 items.desc.raw 仍然出现在fielddata缓存中的原因？ 
解决方案
不知何故，我忘记了全局序数。即使在使用 doc_values 之后，我仍然得到fielddata的用法，因为全局序号不能包含在 doc_values  
 
 
 请参阅更多细节here  
 
During some experiment with fielddata vs doc_values, I encountered a weird case. In my earlier mapping, I didn't use doc values at all. In my new mapping, I've added doc_values: true to all fields in my mapping, except analyzed string fields and booleans (not supported until 2.0).

So in details, here is how I proceeded:

Before reindexing all my data, I restarted my ES 1.7 cluster fresh and ran a query with sorting, aggregations and script fields to "warm up" the fielddata cache. Then I queried the /fielddata endpoint to have an idea of the fielddata cache usage. It looked something like this:
curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

id      host   ip            node  total  items.desc.raw more_fields...
rKX7... myhost 192.168.1.100 Doom  32.9mb 2.3mb          ...
As you can see, the field items.desc.raw used 2.3mb of heap space. items is of type nested and contains a string multi-field with a not_analyzed sub-field called raw. In short, the mapping of that nested field looks like this:
    "items": {
      "type": "nested",
      "properties": {
        "desc": {
          "type": "string",
          "fields": {
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
After adding doc_values: true to items.desc.raw, reindexing the whole index and running some aggregations, sorting and scripting again to warm up the fielddata cache, I queried the /fielddata endpoint again and here was the result:
curl -XGET 'localhost:9200/_cat/fielddata?v&fields=*'

id      host   ip            node  total  items.desc.raw some_bools...
tAB5... myhost 192.168.1.100 Yack  2.1mb  9.2kb          ...
So the fielddata usage has indeed been drastically lowered (which is good), the only fields I see are boolean fields (i.e. some_bools above) which was expected, but to my surprise, my nested not_analyzed string field also appeared, but with a much lower space usage.

What could be the cause of items.desc.raw still appearing in the fielddata cache?
 解决方案 
Somehow I forgot about global ordinals. They are the reason why I'm still getting fielddata usage even after using doc_values as global ordinals cannot be included in doc_values.

See more details here

                        这篇关于not_analyzed字段与doc_values仍然在fielddata缓存的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

not_analyzed字段与doc_values仍然在fielddata缓存 [英] not_analyzed field with doc_values still in fielddata cache

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

not_analyzed字段与doc_values仍然在fielddata缓存 [英] not_analyzed field with doc_values still in fielddata cache

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭