如何在多个文本字段中组合完成,建议和匹配短语? [英] How to combine completion, suggestion and match phrase across multiple text fields?

查看:30
本文介绍了如何在多个文本字段中组合完成,建议和匹配短语?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读有关Elasticsearch的信息

有人可以给我一些提示,告诉我如何在多个文本字段中实现上述功能吗?

起初我尝试过这个:

  GET/myindex/_search{询问": {"match_phrase_prefix":{"myFieldThatIsCombinedViaCopyTo":"revis"}},强调": {字段":{"*":{}},"require_field_match":false}} 

但它会返回如下高亮显示:

在委员会成员提出的上述修订中,现任的 revisionist 也是当事方",

因此,这不再是前缀" ...

也尝试过这个:

  GET/myindex/_search{询问": {"multi_match":{"query":修订版","fields":["myFieldThatIsCombinedViaCopyTo"],"type":"phrase_prefix","operator":和"}},强调": {字段":{"*":{}}}} 

但是它仍然会返回

在委员会成员提出的上述修订中,现任的 revisionist 也是当事方",

注意:我需要搜索大约5个文本"字段.这些字段之一很长(1000个单词).如果将内容分解为关键字,则会丢失该短语.就像我需要在一个组合的文本字段中使用匹配短语前缀一样,带有模糊性?

编辑这是一个文档示例(某些字段已删除,内容已删除):

  {"id":1受访者":印度联盟","caseContent":通过...< snip>对抗印度联盟的< snip> .."} 

按照@Vlad的建议,我尝试了此操作:

  POST/cases/_search开机自检/cases/_search{建议": {受访者建议":{"prefix":"uni",完成":{"field":"respondent.suggest","skip_duplicates":是}},"caseContent-suggest":{"prefix":"uni",完成":{"field":"caseContent.suggest","skip_duplicates":是}}}} 

哪个返回此:

  {接":2"timed_out":否,"_shards":{总计":1,成功":1,已跳过":0,失败":0},点击数":{全部的" : {值":0,"relation":"eq"},"max_score":null,"hits":[]},建议" : {"caseContent-suggest":[{"text":"uni",偏移":0,长度":3,选项" : [ ]}],受访者建议":[{"text":"uni",偏移":0,长度":3,选项" : [{"text":印度联盟","_index":案例","_type":"_doc","_id":"dI5hh3IBEqNFLVH6-aB9","_score":1.0,"_ignored":["headNote.suggest"],_来源" : {< snip>}}]}]}} 

所以看起来它与 respondent 字段匹配,这太棒了!但是,即使在案文(见上文)中包含了反对印度联盟"一词,它在 caseContent 字段上也不匹配.还是因为文本是如何分割的?

解决方案

由于您需要在每个字段上自动完成/建议,因此您需要在每个字段上而不是在 copy_to 上运行建议查询场地.这样就可以确保您拥有正确的前缀.

copy_to 字段非常适合在多个字段中进行搜索,但不适用于自动建议/完整类型的查询.

这个想法是,对于您的每个字段,您都应该有一个 completion 子字段,以便您可以获得每个字段的自动完成结果.

  PUT索引{映射":{特性": {文本1": {"type":文字",字段":{建议": {"type":完成"}}},"text2":{"type":文字",字段":{建议": {"type":完成"}}},"text3":{"type":文字",字段":{建议": {"type":完成"}}}}}} 

您的建议查询将直接在所有子字段上运行:

  POST索引/_search?漂亮{建议": {"text1-suggest":{"prefix":"revis",完成":{"field":"text1.suggest"}},"text2-suggest":{"prefix":"revis",完成":{"field":"text2.suggest"}},"text3-suggest":{"prefix":"revis",完成":{"field":"text3.suggest"}}}} 

这负责自动完成/建议部分.对于拼写错误,建议查询可让您指定 fuzzy 参数

更新

如果需要对文本正文中的所有句子进行前缀搜索,则该方法需要稍作改动.

下面的新映射在一个文本旁边创建一个新的完成字段.这个想法是对要存储在完成字段中的内容进行小的转换(即拆分句子).因此,首先创建索引映射,如下所示:

  PUT索引{映射":{特性": {文本1": {"type":文字",},"text1Suggest":{"type":完成"}}}} 

然后创建一个摄取管道,该管道将使用 text1 字段中的句子填充 text1Suggest 字段:

  PUT _ingest/pipeline/sentence{处理器":[{分裂": {"field":"text1","target_field":"text1Suggest.input",分隔符":"\\.\\ s +"}}]} 

然后,我们可以为像这样的文档建立索引(只有 text1 字段作为完成字段会动态生成)

  PUT test/_doc/1?pipeline =句子{"text1":疯狂的狐狸.快速的蜗牛.约翰去海滩了"} 

被索引的内容是这样的(您的 text1 字段+另一个为句子前缀完成而优化的完成字段):

  {"text1":疯狂的狐狸.猫喝牛奶.约翰去海滩","text1Suggest":{输入": [疯狂的狐狸"猫喝牛奶",约翰去海滩了"]}} 

最后,您可以搜索任何句子的前缀,在下面,我们搜索John,您会得到建议:

  POST测试/_搜索?漂亮{建议": {"text1-suggest":{"prefix":"John",完成":{"field":"text1Suggest"}}}} 

I've been reading about Elasticsearch suggesters, match phrase prefix and highlighting and i'm a bit confused as to which to use to suit my problem.

Requirement: i have a bunch of different text fields, and need to be able to autocomplete and autosuggest across all of them, as well as misspelling. Basically the way Google works.

See in the following Google snapshot, when we start typing "Can", it lists word like Canadian, Canada, etc. This is auto complete. However it lists additional words also like tire, post, post tracking, coronavirus etc. This is auto suggest. It searches for most relevant word in all fields. If we type "canxad" it should also misspel suggest the same results.

Could someone please give me some hints on how i can implement the above functionality across a bunch of text fields?

At first i tried this:

GET /myindex/_search
{
  "query": {
    "match_phrase_prefix": {
      "myFieldThatIsCombinedViaCopyTo": "revis"
    }
  },
  "highlight": {
    "fields": {
      "*": {}
    },
    "require_field_match" : false
  }
}

but it returns highlights like this:

"In the aforesaid revision filed by the members of the Committee, the present revisionist was also party",

So that's not a "prefix" anymore...

Also tried this:

GET /myindex/_search
{
  "query": {
    "multi_match": {
      "query": "revis",
      "fields": ["myFieldThatIsCombinedViaCopyTo"],
      "type": "phrase_prefix",
      "operator": "and"
    }
  },
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

But it still returns

"In the aforesaid revision filed by the members of the Committee, the present revisionist was also party",

Note: I have about 5 "text" fields that I need to search upon. One of those fields is quite long (1000s of words). If I break things up into keywords, I lose the phrase. So it's like I need match phrase prefix across a combined text field, with fuzziness?

EDIT Here's an example of a document (some fields taken out, content snipped):

{
  "id" : 1,
  "respondent" : "Union of India",
  "caseContent" : "<snip>..against the Union of India, through the ...<snip>"
}

As @Vlad suggested, i tried this:

POST /cases/_search
POST /cases/_search
{
  "suggest": {
    "respondent-suggest": {
      "prefix": "uni",
      "completion": {
        "field": "respondent.suggest",
        "skip_duplicates": true
      }
    },
    "caseContent-suggest": {
      "prefix": "uni",
      "completion": {
        "field": "caseContent.suggest",
        "skip_duplicates": true
      }
    }
  }
}

Which returns this:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "caseContent-suggest" : [
      {
        "text" : "uni",
        "offset" : 0,
        "length" : 3,
        "options" : [ ]
      }
    ],
    "respondent-suggest" : [
      {
        "text" : "uni",
        "offset" : 0,
        "length" : 3,
        "options" : [
          {
            "text" : "Union of India",
            "_index" : "cases",
            "_type" : "_doc",
            "_id" : "dI5hh3IBEqNFLVH6-aB9",
            "_score" : 1.0,
            "_ignored" : [
              "headNote.suggest"
            ],
            "_source" : {
              <snip>
            }
          }
        ]
      }
    ]
  }
}

So looks like it matches on the respondent field, which is great! But, it didn't match on the caseContent field, even though the text (see above) includes the phrase "against the Union of India".. shouldn't it match there? or is it because how the text is broken up?

解决方案

Since you need autocomplete/suggest on each field, then you need to run a suggest query on each field and not on the copy_to field. That way you're guaranteed to have the proper prefixes.

copy_to fields are great for searching in multiple fields, but not so good for auto-suggest/-complete type of queries.

The idea is that for each of your fields, you should have a completion sub-field so that you can get auto-complete results for each of them.

PUT index
{
  "mappings": {
    "properties": {
      "text1": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      },
      "text2": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      },
      "text3": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

Your suggest queries would then run on all the sub-fields directly:

POST index/_search?pretty
{
    "suggest": {
        "text1-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text1.suggest" 
            }
        },
        "text2-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text2.suggest" 
            }
        },
        "text3-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text3.suggest" 
            }
        }
    }
}

That takes care of the auto-complete/-suggest part. For misspellings, the suggest queries allow you to specify a fuzzy parameter as well

UPDATE

If you need to do prefix search on all sentences within a body of text, the approach needs to change a bit.

The new mapping below creates a new completion field next to the text one. The idea is to apply a small transformation (i.e. split sentences) to what you're going to store in the completion field. So first create the index mapping like this:

PUT index
{
  "mappings": {
    "properties": {
      "text1": {
        "type": "text",
      },
      "text1Suggest": {
        "type": "completion"
      }
    }
  }
}

Then create an ingest pipeline that will populate the text1Suggest field with sentences from the text1 field:

PUT _ingest/pipeline/sentence
{
  "processors": [
    {
      "split": {
        "field": "text1",
        "target_field": "text1Suggest.input",
        "separator": "\\.\\s+"
      }
    }
  ]
}

Then we can index a document such as this one (with only the text1 field as the completion field will be built dynamically)

PUT test/_doc/1?pipeline=sentence
{
  "text1": "The crazy fox. The quick snail. John goes to the beach"
}

What gets indexed looks like this (your text1 field + another completion field optimized for sentence prefix completion):

{
  "text1": "The crazy fox. The cat drinks milk. John goes to the beach",
  "text1Suggest": {
    "input": [
      "The crazy fox",
      "The cat drinks milk",
      "John goes to the beach"
    ]
  }
}

And finally you can search for prefixes of any sentence, below we search for John and you should get a suggestion:

POST test/_search?pretty
{
  "suggest": {
    "text1-suggest": {
      "prefix": "John",
      "completion": {
        "field": "text1Suggest"
      }
    }
  }
}

这篇关于如何在多个文本字段中组合完成,建议和匹配短语?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆