在索引之前检查Elasticsearch文档的相似性 [英] Check Elasticsearch document similarity before indexing

查看：161 发布时间：2020/6/13 19:06:01 php symfony elasticsearch elastica

本文介绍了在索引之前检查Elasticsearch文档的相似性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好一整天，我试图把头发弄掉之后，我决定从社区中获取一些意见.

Ok after having pulling my hair off all day long trying to figure that one out I decided to get some input from the community.

应该提到我是Elasticsearch的新手.

Should be mentioned that I'm fairly new to Elasticsearch.

我的想法是，我有一个包含一些文档的ES索引，并且仅当没有索引具有相似字段内容(但不一定等于)的现有文档时，才需要对新文档进行索引.

The idea is that I have an ES index containing some documents and I need to index new documents only if no existing documents with similar field content (but not necessarily equals) are already indexed.

我可以在多个字段上执行匹配查询并获得查询的整体得分，但是由于该得分不是可用最高得分的百分比，因此我不确定如何设置阈值来确定是否可以插入是否提供文件.

I can perform a match query on multiple field and get a global score for the query but since that score is not a percentage of the maximum score available I'm not sure how to set a threshold to determine if I can insert the document or not.

对于ES评分系统，我显然有些困惑. 在此先感谢您能提供的所有帮助.

I am obviously a bit confused about the ES scoring system. Thanks in advance for all the help I can get on this.

作为一个基本示例

已被索引:

{
  "title": "My first blog entry",
  "text":  "Just trying this out...",
  "date":  "2014/01/01"
}

这是新的，但是不应索引，因为字段不相等但太相似了:

This is new but should not be indexed since fields are not equals but too similar:

{
  "title": "My first blog entries",
  "text":  "Just trying it out...",
  "date":  "2014/01/01"
}

这是新的，应该建立索引:

This is new and should be indexed:

{
  "title": "My second entry for this blog",
  "text":  "I am just trying out a few things",
  "date":  "2014/01/01"
}

因此，它基本上是在对先前的索引进行重复数据删除，并基于我之后的字段相似性:)

So it's basically deduping prior indexing and based on fields similarity that I am after :)

在索引之前检查Elasticsearch文档的相似性 [英] Check Elasticsearch document similarity before indexing

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

在索引之前检查Elasticsearch文档的相似性 [英] Check Elasticsearch document similarity before indexing

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭