弹性搜索的查询字符串中的符号 [英] Symbols in query-string for elasticsearch

查看:108
本文介绍了弹性搜索的查询字符串中的符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个称为偏差的属性的文档(activerecords)。该属性具有Bin XBin $Bin qBin%等值。



我正在尝试使用轮胎/弹性搜索来搜索属性。我正在使用空白分析器对偏差属性进行索引。这是创建索引的代码:

  settings:analysis => {
:filter => {
:ngram_filter => {
:type => nGram,
:min_gram => 2,
:max_gram => 255
},
:offs_filter => {
:type => word_delimiter,
:type_table => ['$ => ALPHA']
}
},
:analyzer => {
:ngram_analyzer => {
:type => custom,
:tokenizer => standard,
:filter => [smallcase,ngram_filter]
},
:offs_analyzer => {
:type => custom,
:tokenizer => 空白,
:filter => [smallcase]
}
}
} do
mapping do
索引:id,:type => '整数'
[:设备,步骤,食谱,细节,描述] .each do | attribute |
索引属性,:type => 'string',:analyzer => 'ngram_analyzer'
end
索引:deviation,:analyzer => '空白'
end
end

搜索似乎工作正常查询字符串不包含特殊字符。例如 Bin X 将仅返回那些具有单词 Bin AND X 。但是,搜索类似 Bin $ Bin%的所有结果都显示单词 Bin 几乎忽略了符号(结果与符号在搜索结果中显示较高,结果没有)。



这是我的搜索方法已经创建了

  def self.search(params)
tire.search(load:true)do
query {string#{params [:term] .downcase}:#{params [:query]},default_operator:AND}
size 1000
end
end

这里是我如何构建搜索表单:

 < DIV> 
<%= form_tag issues_path,:class => formtastic issue,方法:get get%>
< fieldset class =inputs>
< ol>
< li class =string input medium search query optional stringish inline>
<%opts = [Description,Detail,Deviation,Equipment,Recipe,Step]%>
<%= select_tag:term,options_for_select(opts,params [:term])%>
<%= text_field_tag:query,params [:query]%>
<%= submit_tagSearch,name:nil,class:btn%>
< / li>
< / ol>
< / fieldset>
<%end%>
< / div>


解决方案

您可以清理查询字符串。这是一个消毒剂,适用于我尝试扔过的所有东西:

  def sanitize_string_for_elasticsearch_string_query (str)
#转义特殊字符
#http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping Special Characters
escaped_characters = Regexp.escape ('\\ + - & |!(){} [] ^〜* ?:')
str = str.gsub(/([#{escaped_characters}])/,'\\\ \\\1')

#AND,OR和NOT被lucene用作逻辑运算符。我们需要
#来逃避他们
['AND','OR','NOT']。
escaped_word = word.split('')。map {| char | \\#{char}} .join('')
str = str.gsub(/ \s * \b(#{word.upcase})\b\s * /,#{escaped_word})
end

#逃避奇数引号
quote_count = str.count''
str = str.gsub(/( 。*)(。*)/,'\1\\3')如果quote_count%2 == 1

str
end

params [:query] = sanitize_string_for_elasticsearch_string_query(params [:query])


I have "documents" (activerecords) with an attribute called deviations. The attribute has values like "Bin X" "Bin $" "Bin q" "Bin %" etc.

I am trying to use tire/elasticsearch to search the attribute. I am using the whitespace analyzer to index the deviation attribute. Here is my code for creating the indexes:

settings :analysis => {
    :filter  => {
      :ngram_filter => {
        :type => "nGram",
        :min_gram => 2,
        :max_gram => 255
      },
      :deviation_filter => {
        :type => "word_delimiter",
        :type_table => ['$ => ALPHA']
      }
    },
    :analyzer => {
      :ngram_analyzer => {
        :type  => "custom",
        :tokenizer  => "standard",
        :filter  => ["lowercase", "ngram_filter"]
      },
      :deviation_analyzer => {
        :type => "custom",
        :tokenizer => "whitespace",
        :filter => ["lowercase"]
      }
    }
  } do
    mapping do
      indexes :id, :type => 'integer'
      [:equipment, :step, :recipe, :details, :description].each do |attribute|
        indexes attribute, :type => 'string', :analyzer => 'ngram_analyzer'
      end
      indexes :deviation, :analyzer => 'whitespace'
    end
  end

The search seems to work fine when the query string contains no special characters. For example Bin X will return only those records that have the words Bin AND X in them. However, searching for something like Bin $ or Bin % shows all results that have the word Bin almost ignoring the symbol (results with the symbol do show up higher in the search that results without).

Here is the search method I have created

def self.search(params)
    tire.search(load: true) do
      query { string "#{params[:term].downcase}:#{params[:query]}", default_operator: "AND" }
        size 1000
    end
end

and here is how I am building the search form:

<div>
    <%= form_tag issues_path, :class=> "formtastic issue", method: :get do %>
        <fieldset class="inputs">
        <ol>
            <li class="string input medium search query optional stringish inline">
                <% opts = ["Description", "Detail","Deviation","Equipment","Recipe", "Step"] %>
                <%= select_tag :term, options_for_select(opts, params[:term]) %>
                <%= text_field_tag :query, params[:query] %>
                <%= submit_tag "Search", name: nil, class: "btn" %>
            </li>
        </ol>
        </fieldset>
    <% end %>
</div>

解决方案

You can sanitize your query string. Here is a sanitizer that works for everything that I've tried throwing at it:

def sanitize_string_for_elasticsearch_string_query(str)
  # Escape special characters
  # http://lucene.apache.org/core/old_versioned_docs/versions/2_9_1/queryparsersyntax.html#Escaping Special Characters
  escaped_characters = Regexp.escape('\\+-&|!(){}[]^~*?:')
  str = str.gsub(/([#{escaped_characters}])/, '\\\\\1')

  # AND, OR and NOT are used by lucene as logical operators. We need
  # to escape them
  ['AND', 'OR', 'NOT'].each do |word|
    escaped_word = word.split('').map {|char| "\\#{char}" }.join('')
    str = str.gsub(/\s*\b(#{word.upcase})\b\s*/, " #{escaped_word} ")
  end

  # Escape odd quotes
  quote_count = str.count '"'
  str = str.gsub(/(.*)"(.*)/, '\1\"\3') if quote_count % 2 == 1

  str
end

params[:query] = sanitize_string_for_elasticsearch_string_query(params[:query])

这篇关于弹性搜索的查询字符串中的符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆