在Elastic Search中索引以逗号​​分隔的值字段 [英] Indexing a comma-separated value field in Elastic Search

查看:767
本文介绍了在Elastic Search中索引以逗号​​分隔的值字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Nutch抓取网站并将其编入Elastic搜索。我的网站有元标记,其中一些包含逗号分隔的ID列表(我打算将其用于搜索)。例如:

I'm using Nutch to crawl a site and index it into Elastic search. My site has meta-tags, some of them containing comma-separated list of IDs (that I intend to use for search). For example:

contentTypeIds = 2,5,15。 (注意:没有方括号。)

contentTypeIds="2,5,15". (note: no square brackets).

当ES将其编入索引时,我无法搜索contentTypeIds:5并找到其contentTypeIds 包含的文档5;此查询仅返回其contentTypeIds恰好为 5的文档。但是,我确实想查找其contentTypeIds包含5的文档。

When ES indexes this, I can't search for contentTypeIds:5 and find documents whose contentTypeIds contain 5; this query returns only the documents whose contentTypeIds is exactly "5". However, I do want to find documents whose contentTypeIds contain 5.

在Solr中,这可以通过在schema.xml中将contentTypeIds字段设置为multiValued = true来解决。 。我找不到在ES中做类似事情的方法。

In Solr, this is solved by setting the contentTypeIds field to multiValued="true" in the schema.xml. I can't find how to do something similar in ES.

我是ES新手,所以我可能错过了一些东西。谢谢您的帮助!

I'm new to ES, so I probably missed something. Thanks for your help!

推荐答案

创建自定义分析器,它将使用逗号将索引文本分成标记。

Create custom analyzer which will split indexed text into tokens by commas.

然后您可以尝试搜索。如果您不关心相关性,可以使用过滤器搜索文档。我的示例显示了如何尝试使用术语过滤器

Then you can try to search. In case you don't care about relevance you can use filter to search through your documents. My example shows how you can attempt search with term filter.

下面您可以找到如何使用Sense插件执行此操作。

Below you can find how to do this with sense plugin.

DELETE testindex

PUT testindex
{
    "index" : {
        "analysis" : {
            "tokenizer" : {
                "comma" : {
                    "type" : "pattern",
                    "pattern" : ","
                }
            },
            "analyzer" : {
                "comma" : {
                    "type" : "custom",
                    "tokenizer" : "comma"
                }
            }
        }
    }
}

PUT /testindex/_mapping/yourtype
{
        "properties" : {
            "contentType" : {
                "type" : "string",
                "analyzer" : "comma"
            }
        }
}

PUT /testindex/yourtype/1
{
    "contentType" : "1,2,3"
}

PUT /testindex/yourtype/2
{
    "contentType" : "3,4"
}

PUT /testindex/yourtype/3
{
    "contentType" : "1,6"
}

GET /testindex/_search
{
    "query": {"match_all": {}}
}

GET /testindex/_search
{
    "filter": {
        "term": {
           "contentType": "6"
        }
    }
}

希望有帮助。

这篇关于在Elastic Search中索引以逗号​​分隔的值字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆