Logstash-> Elasticsearch-更新非规范化数据 [英] Logstash -> Elasticsearch - update denormalized data

查看:106
本文介绍了Logstash-> Elasticsearch-更新非规范化数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个关系数据库,其中包含有关我们日常运营的数据.目的是允许用户使用全文本搜索引擎来搜索重要数据.数据经过规范化,因此不是进行全文查询的最佳形式,因此,其想法是对数据的一部分进行规范化,然后将其实时复制到Elasticsearch,这使我们能够创建一个快速而准确的搜索应用程序

We have a relational database with data about our day-to-day operations. The goal is to allow users to search the important data with a full-text search engine. The data is normalized and thus not in the best form to make full-text queries, so the idea was to denormalize a subset of the data and copy it in real-time to Elasticsearch, which allows us to create a fast and accurate search application.

我们已经有一个系统可以启用事件源我们的数据库操作(插入,更新,删除).这些事件仅包含已更改的列和主键(在更新中,我们没有得到整行). Logstash已经为每个事件得到通知,因此这部分已经得到处理.

We already have a system in place that enables Event Sourcing of our database operations (inserts, updates, deletes). The events only contains the changed columns and primary keys (on an update we don't get the whole row). Logstash already gets notified for each event so this part is already handled.

现在我们要解决我们的问题.由于计划是对数据进行非规范化,因此我们必须确保将父对象的更新传播到Elasticsearch中的非规范化子对象.我们如何配置logstash来做到这一点?

Now we are getting to our problem. Since the plan is to denormalize our data we will have to make sure updates on parent objects are propagated to the denormalized child objects in Elasticsearch. How can we configure logstash to do this?

让我们说我们在Elasticsearch中维护Employees的列表.每个Employee都分配给一个Company.由于数据是非规范化的(出于快速搜索的目的),因此每个Employee也带有Company的名称和地址.更新会更改Company的名称-我们如何配置Logstash更新所有分配给CompanyEmployees中的公司名称?

Lets say we maintain a list of Employees in Elasticsearch. Each Employee is assigned to a Company. Since the data is denormalized (for the purpose of faster search), each Employee also carries the name and address of the Company. An update changes the name of a Company - how can we configure logstash to update the company name in all Employees, assigned to the Company?

@Darth_Vader: 我们面临的问题是,我们收到一个Company发生更改的事件,但是我们想在Elasticsearch中修改类型为Employee的文档,因为它们本身携带有关公司的数据.您的答案期望我们每个Employee都会收到一个事件,情况并非如此.

@Darth_Vader: The problem we are facing is, that we get an event that a Company has changed, but we want to modify documents of type Employee in Elasticsearch, because they carry the data about the company in itself. Your answer expects that we will get an event for every Employee, which is not the case.

也许这将使其更加清晰.我们在Elasticsearch中有3名员工:

Maybe this will make it clearer. We have 3 employees in Elasticsearch:

{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company A'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}

然后在源数据库中进行更新.

Then an update happens in the source DB.

UPDATE company SET name = 'Company NEW' WHERE cmp_id = 1;

我们在logstash中得到一个事件,它表示如下内容:

We get an event in logstash, where it says something like this:

{type:'company',cmp_id:'1',old.name:'Company A',new.name:'Company NEW'}

然后应将其传播到Elasticsearch,以使最终的雇员为:

This should then be propagated to Elasticsearch, so that the resulting employees are:

{type:'employee',id:'1',name:'Person 1',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'2',name:'Person 2',company.cmp_id:'1',company.name:'Company NEW'}
{type:'employee',id:'3',name:'Person 3',company.cmp_id:'2',company.name:'Company B'}

请注意,字段 company.name 已更改.

Notice that the field company.name changed.

推荐答案

我建议与我发布的内容类似的解决方案

I suggest a similar solution to what I've posted here, i.e. to use the http output plugin in order to issue an update by query call to the Employee index. The query would need to look like this:

POST employees/_update_by_query
{
  "script": {
    "source": "ctx._source.company.name = params.name",
    "lang": "painless",
    "params": {
      "name": "Company NEW"
    }
  },
  "query": {
    "term": {
      "company.cmp_id": "1"
    }
  }
}

因此您的Logstash配置应如下所示:

So your Logstash config should look like this:

input {
  ... 
}
filter {
  mutate {
    add_field => {
      "[script][lang]" => "painless"
      "[script][source]" => "ctx._source.company.name = params.name"
      "[script][params][name]" => "%{new.name}"
      "[query][term][company.cmp_id]" => "%{cmp_id}"
    }
    remove_field => ["host", "@version", "@timestamp", "type", "cmp_id", "old.name", "new.name"]
  }
}
output {
  http {
    url => "http://localhost:9200/employees/_update_by_query"
    http_method => "post"
    format => "json"
  }
}

这篇关于Logstash-> Elasticsearch-更新非规范化数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆