提取用于创建/更新文档日期的管道 [英] Ingest pipeline for create/update document dates

查看:38
本文介绍了提取用于创建/更新文档日期的管道的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试实现类似于Mysql的行为,即通过ES管道为我索引的每个文档在元数据上添加insertd_at/updated_at.

I'm trying to achieve Mysql like behaviour adding inserted_at/updated_at on metadata for each doc I index through ES pipeline.

我的管道就像:

{
  "description": "Adds createdAt and updatedAt style timestamps",
  "processors": [
    {
      "set": {
        "field": "_source.indexed_at",
        "value": "{{_ingest.timestamp}}",
        "override": false
      }
    },
    {
      "set": {
        "field": "_source.updated_at",
        "value": "{{_ingest.timestamp}}",
        "override": true
      }
    }
  ]
}

我想没有映射只尝试添加一个文档:

I have like no mapping only tried it adding one doc:

POST test_pipelines/doc/1?pipeline=timestamps
{
  "foo": "bar"
}

管道成功创建 indexed_at updated_at :

{
  "_index": "test_pipelines",
  "_type": "doc",
  "_id": "1",
  "_score": 1,
  "_source": {
    "indexed_at": "2018-07-12T10:47:27.957Z",
    "updated_at": "2018-07-12T10:47:27.957Z",
    "foo": "bar"
  }
}

但是,如果我尝试更新文档1的字段 indexed_at ,它每次都会更改为与更新文档的日期相同.

But if I try to update the doc 1 the field indexed_at it's changing every time to the same date the document it's updated.

更新请求示例:

POST test_pipelines/doc/1?pipeline=timestamps
{
  "foo": "bor"
}

有什么方法可以告诉处理器不要更新 indexed_at 字段?

There's any way to tell processor to not update indexed_at field?

推荐答案

发生这种情况的原因是因为 set 处理器将仅在您发送的文档的上下文中运行,而不会一个存储(如果有).因此, override 在这里无效,因为您发送的文档既不包含 indexed_at 也不包含 updated_at ,这就是两个字段均设置为on的原因每个电话.

The reason this is happening is because the set processor will only operate within the context of the document you're sending, not the one stored (if any). Hence, override has no effect here since the document you send does neither contain indexed_at nor updated_at, which is the reason why both fields are set on each call.

第二次 PUT 文档时,您没有更新它,实际上是从头开始为它重新编制索引(即,您覆盖了您发送的第一个版本).摄取管道不适用于更新操作.例如,如果您尝试以下更新调用,它将失败.

When you PUT your document a second time, you're not updating it, you're actually re-indexing it from scratch (i.e. you're overriding the first version you sent). Ingest pipelines do not work with update operations. For instance, if you try the following update call, it will fail.

POST test_pipelines/doc/1/_update?pipeline=timestamps
{
  "doc": {
    "foo": "bor"
  }
}

如果您要坚持使用摄取管道,那么使其生效的唯一方法是先 GET 文档,然后更新所需的字段.例如,

If you want to stick with your ingest pipeline, the only way to make it work is to GET the document first and then update the field(s) you want. For instance,

# 1. index the document the first time
PUT test_pipelines/doc/1?pipeline=timestamps
{
  "foo": "bar"
}

# 2. GET the indexed document
GET test_pipelines/doc/1

# 3. update the foo field and index it again
PUT test_pipelines/doc/1?pipeline=timestamps
{
  "indexed_at": "2018-07-20T05:08:52.293Z",
  "updated_at": "2018-07-20T05:08:52.293Z",
  "foo": "bor"
}

# 4. When you GET the document the second time, you'll see your pipeline worked
GET test_pipelines/doc/1

这将返回:

{
  "indexed_at": "2018-07-20T05:08:52.293Z",
  "updated_at": "2018-07-20T05:08:53.345Z",
  "foo": "bor"
}

我绝对同意这确实很麻烦,但是我上面给出的链接列举了更新操作不支持管道的所有原因.

I definitely agree this is really troublesome, but the link I gave above enumerates all the reasons why pipelines are not supported on update operations.

使它按您喜欢的方式工作(没有管道)的另一种方法是使用脚本化的upsert操作(其工作方式与上述步骤2和3相同,即在单个原子操作中对文档进行GET和PUT),并且也可以与您的批量通话配合使用.基本上是这样的.首先,您需要存储一个将用于索引和更新操作的脚本:

Another way to make it work the way you like (without pipelines) would be to use a scripted upsert operation (which works like steps 2 and 3 above, i.e. GETs and PUTs the document in a single atomic operation), and that would also work with your bulk calls. It basically goes like this. First you need to store a script that you will call for both your indexing and update operations:

POST _scripts/update-doc
{
  "script": {
    "lang": "painless",
    "source": "ctx._source.foo = params.foo; ctx._source.updated_at = new Date(); if (ctx._source.indexed_at == null) ctx._source.indexed_at = ctx._source.updated_at;"
  }
}

然后,您可以像这样第一次在文档中建立索引:

Then, you can index your document the first time like this:

POST test_pipelines/doc/1/_update
{
  "script": {
    "id": "update-doc",
    "params": {
      "foo": "bar"
    }
  },
  "scripted_upsert": true,
  "upsert": {}
}

索引文档将如下所示:

{
    "updated_at": "2018-07-20T05:57:40.510Z",
    "indexed_at": "2018-07-20T05:57:40.510Z",
    "foo": "bar"
}

在更新文档时,您可以使用完全相同的调用:

And you can use the exact same call when updating the document:

POST test_pipelines/doc/1/_update
{
  "script": {
    "id": "update-doc",
    "params": {
      "foo": "bor"             <--- only this changes
    }
  },
  "scripted_upsert": true,
  "upsert": {}
}

更新后的文档将如下所示,正是您想要的:

The updated document will look like this, exactly what you wanted:

{
    "updated_at": "2018-07-20T05:58:42.825Z",
    "indexed_at": "2018-07-20T05:57:40.510Z",
    "foo": "bor"
}

这篇关于提取用于创建/更新文档日期的管道的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆