分数搜索的Solr配置 [英] Solr configuration for scored search

查看:129
本文介绍了分数搜索的Solr配置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试设置一个Solr索引,以针对产品信息数据库进行搜索.为此,我填充了一个包含产品详细信息的数据库,并使用了Solr 6.0.0.对于给定的产品详细信息(标题,品牌,其他关键字),我想知道数据库中是否有与给定的详细信息紧密匹配的产品.我已经开始数据导入并创建了索引.但是,当我搜索时,尽管列出的产品不同,但匹配产品的分数都相同.我尝试使用搜索关键字的不同组合,但结果在每种情况下都是相似的.我也尝试过使用不同的Tokenizers和Filters.

I am trying to setup a Solr index for searching against a database of product information. For this purpose, I have populated a database of product details and used Solr 6.0.0. For a given product detail (title, brand, other keywords), I would like to know if there is a product in the database that closely matches the given details. I have started dataimport and created the index. However, when I search, the scores of the matching product are all the same in spite of the products listed being different. I have tried with different combinations of search keywords, but the result is similar in every case. I have also tried using different Tokenizers and Filters.

我尝试过的schema.xml的示例是:

<?xml version="1.0" encoding="UTF-8" ?>

<schema name="example" version="1.5">
 <field name="id" type="Int"  indexed="true" stored="true"/>
  <field name="name" type="text_general"  indexed="true" stored="true" />
  <field name="brand" type="text_general"  indexed="true" stored="true"/>
  <field name="category" type="text_general"  indexed="true" stored="true"/>
  <field name="description" type="text_general" indexed="true" stored="true" /> 
  <field name="catchall" type="text_general" indexed="true" stored="true" multiValued="true" />
    <copyField source="id" dest="catchall" />
    <copyField source="name" dest="catchall" />
    <copyField source="brand" dest="catchall" />
    <copyField source="category" dest="catchall" />
    <copyField source="description" dest="catchall" />
    <uniqueKey>id</uniqueKey>
    <defaultSearchField>catchall</defaultSearchField>
    <types>
        <fieldtype name="string" class="solr.StrField" sortMissingLast="true" />
        <fieldtype name="Int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
        <fieldtype name="text_general" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" 
                    splitOnNumerics="1"
                    splitOnCaseChange="1"
                    generateNumberParts="1"
                    catenateWords="0"
                    catenateNumbers="0"
                    catenateAll="0"
                    preserveOriginal="1"
                    />

            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
            <filter class="solr.ICUFoldingFilterFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
          <analyzer type="query">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" 
                    splitOnNumerics="1"
                    splitOnCaseChange="1"
                    generateNumberParts="1"
                    catenateWords="0"
                    catenateNumbers="0"
                    catenateAll="0"
                    preserveOriginal="1"
                    />
            <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
            <filter class="solr.ICUFoldingFilterFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>
          </analyzer>
        </fieldtype>
        <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" />
    </types>
</schema>

修改

data-config.xml中的实体定义如下

<entity name="master_products"  
    pk="id"
    query="select p.* ,b.*  from master_products p ,master_brands b  where b.id=p.brand_id"
    deltaImportQuery="SELECT * FROM master_products WHERE product_name='${dataimporter.delta.product_name}' "
    >
    <!-- or b.brnad='${dataimporter.delta.brand}' -->

     <field column="product_name" name="name"/> 
     <field column="product_description" name="description"/> 
     <field column="id" name="id"/>
     <field column="mrp" name="mrp"/> 
     <field column="brand" name="brand"/>


  <entity name="master_brands" 
    query="select * from master_brands"
    deltaImportQuery="select * from master_brands where id ={master_products.brand_id}" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" >

  </entity>

  <entity name="master_product_categories" 
    query="select * from master_product_categories"
    deltaImportQuery="select * from master_product_categories where id ={master_products.   product_category_id}" processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache" >
    <field column="category" name="category" />
  </entity>

 </entity> 

编辑 查询如下.

Edit The query is as below.

http://localhost:8983/solr/myproducts/select?fl=* score&fq=brand:Nikon&fq=mrp:28950*&indent=on&q=name:*"Nikon D3200 (Black) DSLR with  AF-S 18-55mm VR Kit Lens"*&wt=json

我想帮助我实现目标.您能指导我创建适合我目的的正确配置吗?预先感谢.

I would like help in achieving my goal. Can you please direct me to creating the proper configuration that would meet my purpose? Thanks in advance.

推荐答案

通配符查询是恒定得分,表示他们不会更改匹配文档的分数.您可能想使用常规查询(而不是通配符)来获得文档之间的正确评分.

Wildcard queries are constant scoring, meaning that they won't change the score of the documents that match. You probably want to use regular querying (and not wildcards) to get proper scoring between documents.

范围查询[a TO z],前缀查询a *和通配符查询a * b是恒定得分(所有匹配的文档得分均等).不使用评分因子tf,idf,索引提升和坐标.匹配词条的数量没有限制(与过去的Lucene版本一样).

Range queries [a TO z], prefix queries a*, and wildcard queries a*b are constant-scoring (all matching documents get an equal score). The scoring factors tf, idf, index boost, and coord are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).

fq术语不影响得分,它们只是过滤结果集.

fq terms does not affect score, they just filter the result set.

这篇关于分数搜索的Solr配置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆