ElasticSearch过滤的别名创建-最佳实践 [英] ElasticSearch Filtered Aliases Creation - Best Practice

查看:143
本文介绍了ElasticSearch过滤的别名创建-最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们计划使用此处提到的过滤别名- https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/indices-aliases.html

We are planning to use Filtered Aliases as mentioned here - https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html

我们的输入数据将是一个流,其中流的每一行都对应于我们要存储在ES中的对象.

Our input data is going to be a stream with each line of the stream corresponding to an object we would like to store in ES.

每个对象都包含一个"id",我们将其用于路由和过滤.

Each object contains an 'id', which we are using for routing and filtering.

问题-我们如何以高效的方式创建别名和索引数据?

QUESTION - How do we create alias and index data in a performant way ?

-我们是否为所有数据建立索引,跟踪所有唯一的"id",并最终创建过滤后的别名?或

-- Do we index all data, keep track of all the unique 'id's and the very end create the filtered alias ? OR

-对于每个对象,检查是否存在该"id"的别名;如果它不创建一个?

-- For each object, check if an alias for that 'id' exists; if it doesn't create one ?

我倾向于第一种方法.与第二种方法相比,它是明智的又是高性能的吗?

I'm leaning towards the first approach. Is it advisable and performant when compared to the second approach ?

TIA.

推荐答案

基于上面的讨论,并且浏览了您发布的博客文章之后,我很肯定地说,在您的情况下,您根本不需要别名并且路由键就足够了.同样,仅因为您有一个索引,如果您有多个索引,这将不再成立!

Based on our discussion above and after having glanced over the blog article you posted, I'm pretty positive that in your case you don't need aliases at all and the routing key would suffice. Again, only because you have a single index, if you had many indices this would not be true anymore!

您只需要指定索引文档时要使用的路由键.在ES 2.0之前,您可以使用 _routing 字段用于此目的,即使在ES 1.5中已弃用该字段,但在您的情况下,它却可以满足您的目的.

You simply need to specify the routing key to use when indexing your document. Until ES 2.0, you can use the _routing field for that purpose, even though it's been deprecated in ES 1.5, but in your case it serves your purpose.

{
    "customer" : {
        "_routing" : {
            "required" : true,
            "path" : "customer_id"     <----- the field you use as the routing key
        },
        "properties": { ... }
    }
}

然后,在搜索时,除了客户ID过滤器外,您只需在搜索URL中指定& routing =< customer_id> (因为给定的分片可以为不同客户托管文档).您的搜索将直接转到由给定路由键标识的分片,因此,仅从指定客户那里检索数据.

Then when searching you simply need to specify &routing=<customer_id> in your search URL in addition to your customer id filter (since a given shard can host documents for different customers). Your search will go directly to the shard identified by the given routing key, and thus, only retrieve data from the specified customer.

为此使用过滤后的别名不会带来任何好处,因为您要在别名定义中包含的过滤器和路由键不会带来任何其他贡献,因为检索到的文档已经被路由键过滤"(了).这比尝试检测(在要索引的每个新文档上)是否存在别名并在不存在时创建别名容易得多.

Using a filtered alias for this brings nothing as the filter and routing key you'd include in your alias definition would not contribute anything additional, since the retrieved documents are already "filtered" (kind of) by the routing key. This is way easier than trying to detect (on each new document to index) if an alias exists or not and create it if it doesn't.

更新:

现在,如果您绝对有/想要创建过滤后的别名,则性能更高的方法将是您提到的第一个方法:

Now if you absolutely have/want to create filtered aliases, the more performant way would be the first one you mentioned:

  1. 首先索引您的日常数据
  2. 然后在 customer_id 字段上运行 term 足够高(即高于字段的基数,即〜)的 terms 聚合100)以确保您捕获所有唯一的客户ID来创建别名
  3. 遍历所有存储桶以检索所有唯一的客户ID
  4. 一击中创建所有别名为每个 customer_id
  5. 使用一个 action
  1. First index your daily data
  2. Then run a terms aggregation on your customer_id field with size high enough (i.e. higher than the cardinality of the field, which was ~100 in your case) to make sure you capture all unique customer ids to create your aliases
  3. Loop over all the buckets to retrieve all unique customer ids
  4. Create all aliases in one shot using one action for each customer_id

curl -XPOST 'http://localhost:9200/_aliases' -d '{
    "actions" : [
        {
            "add" : {
                 "index" : "customers",
                 "alias" : "alias_cid1",
                 "routing" : "cid1",
                 "filter" : { "term" : { "customer_id" : "cid1" } }
            }
        },
        {
            "add" : {
                 "index" : "customers",
                 "alias" : "alias_cid2",
                 "routing" : "cid2",
                 "filter" : { "term" : { "customer_id" : "cid2" } }
            }
        },
        {
            "add" : {
                 "index" : "customers",
                 "alias" : "alias_cid3",
                 "routing" : "cid3",
                 "filter" : { "term" : { "customer_id" : "cid3" } }
            }
        },
        ...
    ]
}'

请注意,您不必担心别名是否已存在,整个命令不会失败,并且会无提示地忽略现有别名.

Note that you don't have to worry if an alias already exists, the whole command won't fail and silently ignore the existing alias.

运行此命令后,所有别名都将放在唯一索引上,并正确配置了过滤器和路由键.

When this command has run, you'll have all your aliases on your unique index, properly configured with a filter and a routing key.

这篇关于ElasticSearch过滤的别名创建-最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆