索引网站/网址在弹性搜索 [英] Indexing website/url in Elastic Search

查看:123
本文介绍了索引网站/网址在弹性搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个网站字段的索引在弹性搜索。示例值: http://example.com 。问题是当我搜索示例时,该文档不包括在内。如何正确映射网站/网址栏?

I have a website field of a document indexed in elastic search. Example value: http://example.com . The problem is that when I search for example, the document is not included. How to map correctly the website/url field?

我创建了以下索引:

{
  "settings":{
    "index":{
        "analysis":{
        "analyzer":{
            "analyzer_html":{
                  "type":"custom",
                  "tokenizer": "standard",
                "filter":"standard",
                "char_filter": "html_strip"
            }
        }
        }
    }
  },
  "mapping":{
    "blogshops": {
        "properties": {
            "category": {
                "properties": {
                    "name": {
                        "type": "string"
                    }
                }
            },
            "reviews": {
                "properties": {
                    "user": {
                        "properties": {
                            "_id": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        }
    }
  }
}


推荐答案

我想你正在使用标准分析器,将 http://example.dom 分成两个令牌 - http example.com 。您可以查看 http:// localhost:9200 / _analyze?text = http://example.com& analyzer = standard

I guess you are using standard analyzer, which splits http://example.dom into two tokens - http and example.com. You can take a look http://localhost:9200/_analyze?text=http://example.com&analyzer=standard.

如果要拆分 url ,则需要使用不同的分析器或指定我们自己的自定义分析器

If you want to split url, you need to use different analyzer or specify our own custom analyzer.

您可以看看如何将 url href =http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html>简单分析器 - http:// localhost :9200 / _analyze文本= HTTP://example.com&分析器=简单。如您所见,现在是 url 索引为三个令牌 ['http','example','com'] 。如果您不想像 ['http','www'] 等索引令牌,则可以使用小写标记器(这是简单分析器中使用的)和拦截过滤器。例如:

You can take a look how would be url indexed with simple analyzer - http://localhost:9200/_analyze?text=http://example.com&analyzer=simple. As you can see, now is url indexed as three tokens ['http', 'example', 'com']. If you don't want to index tokens like ['http', 'www'] etc, you can specify your analyzer with lowercase tokenizer (this is the one used in simple analyzer) and stop filter. For example something like this:

# Delete index
#
curl -s -XDELETE 'http://localhost:9200/url-test/' ; echo

# Create index with mapping and custom index
#
curl -s -XPUT 'http://localhost:9200/url-test/' -d '{
  "mappings": {
    "document": {
      "properties": {
        "content": {
          "type": "string",
          "analyzer" : "lowercase_with_stopwords"
        }
      }
    }
  },
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0
    },
    "analysis": {
      "filter" : {
        "stopwords_filter" : {
          "type" : "stop",
          "stopwords" : ["http", "https", "ftp", "www"]
        }
      },
      "analyzer": {
        "lowercase_with_stopwords": {
          "type": "custom",
          "tokenizer": "lowercase",
          "filter": [ "stopwords_filter" ]
        }
      }
    }
  }
}' ; echo

curl -s -XGET 'http://localhost:9200/url-test/_analyze?text=http://example.com&analyzer=lowercase_with_stopwords&pretty'

# Index document
#
curl -s -XPUT 'http://localhost:9200/url-test/document/1?pretty=true' -d '{
  "content" : "Small content with URL http://example.com."
}'

# Refresh index
#
curl -s -XPOST 'http://localhost:9200/url-test/_refresh'

# Try to search document
#
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
  "query" : {
    "query_string" : {
        "query" : "content:example"
    }
  }
}'

注意:如果你不喜欢使用这个词,这里是有趣的文章停止停止停止词:查看常用词语查询

NOTE: If you don't like to use stopwords here is interesting article stop stopping stop words: a look at common terms query

这篇关于索引网站/网址在弹性搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆