索引网站/网址在弹性搜索 [英] Indexing website/url in Elastic Search
问题描述
我有一个网站
字段的索引在弹性搜索。示例值: http://example.com 。问题是当我搜索示例
时,该文档不包括在内。如何正确映射网站/网址栏?
I have a website
field of a document indexed in elastic search. Example value: http://example.com . The problem is that when I search for example
, the document is not included. How to map correctly the website/url field?
我创建了以下索引:
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_html":{
"type":"custom",
"tokenizer": "standard",
"filter":"standard",
"char_filter": "html_strip"
}
}
}
}
},
"mapping":{
"blogshops": {
"properties": {
"category": {
"properties": {
"name": {
"type": "string"
}
}
},
"reviews": {
"properties": {
"user": {
"properties": {
"_id": {
"type": "string"
}
}
}
}
}
}
}
}
}
推荐答案
我想你正在使用标准
分析器,将 http://example.dom
分成两个令牌 - http
和 example.com
。您可以查看 http:// localhost:9200 / _analyze?text = http://example.com& analyzer = standard
。
I guess you are using standard
analyzer, which splits http://example.dom
into two tokens - http
and example.com
. You can take a look http://localhost:9200/_analyze?text=http://example.com&analyzer=standard
.
如果要拆分 url
,则需要使用不同的分析器或指定我们自己的自定义分析器。
If you want to split url
, you need to use different analyzer or specify our own custom analyzer.
您可以看看如何将 url href =http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-simple-analyzer.html>简单分析器 - http:// localhost :9200 / _analyze文本= HTTP://example.com&分析器=简单
。如您所见,现在是 url
索引为三个令牌 ['http','example','com']
。如果您不想像 ['http','www']
等索引令牌,则可以使用小写标记器(这是简单分析器中使用的)和拦截过滤器。例如:
You can take a look how would be url
indexed with simple analyzer - http://localhost:9200/_analyze?text=http://example.com&analyzer=simple
. As you can see, now is url
indexed as three tokens ['http', 'example', 'com']
. If you don't want to index tokens like ['http', 'www']
etc, you can specify your analyzer with lowercase tokenizer (this is the one used in simple analyzer) and stop filter. For example something like this:
# Delete index
#
curl -s -XDELETE 'http://localhost:9200/url-test/' ; echo
# Create index with mapping and custom index
#
curl -s -XPUT 'http://localhost:9200/url-test/' -d '{
"mappings": {
"document": {
"properties": {
"content": {
"type": "string",
"analyzer" : "lowercase_with_stopwords"
}
}
}
},
"settings" : {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
},
"analysis": {
"filter" : {
"stopwords_filter" : {
"type" : "stop",
"stopwords" : ["http", "https", "ftp", "www"]
}
},
"analyzer": {
"lowercase_with_stopwords": {
"type": "custom",
"tokenizer": "lowercase",
"filter": [ "stopwords_filter" ]
}
}
}
}
}' ; echo
curl -s -XGET 'http://localhost:9200/url-test/_analyze?text=http://example.com&analyzer=lowercase_with_stopwords&pretty'
# Index document
#
curl -s -XPUT 'http://localhost:9200/url-test/document/1?pretty=true' -d '{
"content" : "Small content with URL http://example.com."
}'
# Refresh index
#
curl -s -XPOST 'http://localhost:9200/url-test/_refresh'
# Try to search document
#
curl -s -XGET 'http://localhost:9200/url-test/_search?pretty' -d '{
"query" : {
"query_string" : {
"query" : "content:example"
}
}
}'
注意:如果你不喜欢使用这个词,这里是有趣的文章停止停止停止词:查看常用词语查询
NOTE: If you don't like to use stopwords here is interesting article stop stopping stop words: a look at common terms query
这篇关于索引网站/网址在弹性搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!