如何在Hibernate Search / Lucene中禁用默认评分/提升？ [英] How to disable default scoring/boosting in Hibernate Search/Lucene?

查看：250 发布时间：2018/6/8 19:23:36 java hibernate indexing lucene hibernate-search

本文介绍了如何在Hibernate Search / Lucene中禁用默认评分/提升？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想为用户提供最相关和最好的结果。例如，我的回报记录有很大的标题，描述，附加照片等。对于上下文：记录是自行车路线，具有路线点（坐标）和元数据，如照片，评论等。

现在，我使用 Hibernate 为这些记录建立索引，然后使用 Lucene 在 Hibernate Search 中。为了评分我的结果，我构建了基于文档属性的查询，并在中使用 boostedTo（）>来提升它们。 > BooleanJunction子句：

  bj.should（qb.range（） .onField（ descriptionLength）上述（3000）.createQuery（））boostedTo（3.0F）。; 
 bj.should（qb.range（）。onField（views.views）。above（5000）.createQuery（））。boosttedTo（3.0f）; 
 bj.should（qb.range（）。onField（nameLength）。（20）.createQuery（））。boosttedTo（1.0f）; 
 bj.should（qb.range（）。onField（picturesLength）。（0）.createQuery（））。boosttedTo（5.0f）; 
 bj.should（qb.keyword（）。onField（routePoints.poi.participant）。matching（true）。createQuery（））。boosttedTo（10.0f）;

要尝试和禁用Lucene的评分，我已经重写了 DefaultSimilarity class，将所有比较设置为1.0f得分，并通过Hibernate配置启用它：

  public class IgnoreScoringSimilarity extends DefaultSimilarity {
 @Override 
 public float idf（long docFreq，long numDocs）{
 return 1.0f; 
} 
 
 @Override 
 public float tf（float freq）{
 return 1.0f; 
} 
 
 @Override 
 public float coord（int overlap，int maxOverlap）{
 return 1.0f; 
} 
 
 @Override 
 public float lengthNorm（FieldInvertState state）{
 return 1.0f; 
} 
 
 @Override 
 public float queryNorm（float sumOfSquaredWeights）{
 return 1.0f; 
 
 
 $ / code $ / pre 
 $ b $ h Hibernate配置
 
 
 < property name =hibernate.search.default.similarityvalue =com.search.IgnoreScoringSimilarity/> 
  
这种方法适用于90％的时间，但是我仍然看到一些奇怪的结果不合适。我认识到的模式是这些路线（文件）的尺寸非常大。正常路线有大约20-30个路线点，但这些不合适的结果有100-150个。这让我相信默认的Lucene评分仍在发生（由于文档的大小，得分较高）。 
 
 
 我在禁用Lucene的得分方面做错了什么？有没有另外的解释？ 
解决方案
我可以根据自定义结果排序建议另一种方法。您可以在答案中阅读它。这个答案有点过时了，所以我根据Lucene API 4.10.1修改了这个例子。 Comparator  
 
 
  public abstract class CustomComparator extends FieldComparator< Double> {
 double []得分; 
双底; 
 double topValue; 
 private FieldCache.Ints [] currentReaderValues; 
私人字符串[]字段; 
 
 protected double getScore（int [] value）; 
 
 public CustomComparator（int hitNum，String [] fields）{
 this.fields = fields; 
得分=新双[hitNum]; 
} 
 
 int [] fromReaders（int doc）{
 int [] result = new int [currentReaderValues.length]; 
 for（int i = 0; i< result.length; i ++）{
 result [i] = currentReaderValues [i] .get（doc）; 
} 
返回结果; 
} 
 
 @Override 
 public int compare（int slot1，int slot2）{
 return Double.compare（scoring [slot1]，scoring [slot2]）; 
} 
 
 @Override 
 public void setBottom（int slot）{
 this.bottom = scoring [slot]; 
} 
 
 @Override 
 public void setTopValue（Double top）{
 topValue = top; 
 
 $ b @Override 
 public int compareBottom（int doc）throws IOException {
 double v2 = getScore（fromReaders（doc））; 
返回Double.compare（bottom，v2）; 
 
 $ b @Override 
 public int compareTop（int doc）throws IOException {
 double docValue = getScore（fromReaders（doc））; 
返回Double.compare（topValue，docValue）; 
 
 $ b @Override 
 public void copy（int slot，int doc）throws IOException {
 scoring [slot] = getScore（fromReaders（doc））; 
} 
 
 @Override 
 public FieldComparator< Double> setNextReader（AtomicReaderContext atomicReaderContext）throws IOException {
 currentReaderValues = new FieldCache.Ints [fields.length]; 
 for（int i = 0; i< fields.length; i ++）{
 currentReaderValues [i] = FieldCache.DEFAULT.getInts（atomicReaderContext.reader（），fields [i]，null，false ）; 
} 
返回此; 
} 
 
 @Override 
 public Double value（int slot）{
 return scoring [slot]; 
 
 
 
 
 
 
 搜索范例
 
 
  public class SortExample {
 
 public static void main（String [] args）throws IOException {
 
 final String [] fields = new String [] {descriptionLength，views.views，nameLength}; 
 
 Sort sort = new Sort（
 new SortField（
，
）FieldComparatorSource（）{
 public FieldComparator newComparator（String fieldname，int numHits ，int sortPos，boolean reverse）throws IOException {
返回新的CustomComparator（numHits，fields）{
 @Override 
 protected double getScore（int [] value）{
 int descriptionLength = value [0]; 
 int views = value [1]; 
 int nameLength = value [2]; 
 return  - （（descriptionLength> 2000.0？5.0：0.0）+ 
（视图> 5000.0？3.0：0.0）+ 
（na meLength> 20.0？ 1.0：0.0））; 
} 
}; 
} 
} 
）
）; 
 
 IndexWriterConfig indexWriterConfig = new IndexWriterConfig（Version.LUCENE_4_10_4，new StandardAnalyzer（））; 
目录目录=新RAMDirectory（）; 
 IndexWriter indexWriter = new IndexWriter（directory，indexWriterConfig）; 
 
 addDoc（indexWriter，score 0，1000，1000，10）; 
 addDoc（indexWriter，score 5，3000，1000，10）; 
 addDoc（indexWriter，score 3，1000，6000，10）; 
 addDoc（indexWriter，score 1，1000，1000，30）; 
 addDoc（indexWriter，score 4，1000，6000，30）; 
 addDoc（indexWriter，score 6，5000，1000，30）; 
 addDoc（indexWriter，score 9，5000，6000，30）; 
 
最终IndexReader indexReader = DirectoryReader.open（indexWriter，false）; 
 IndexSearcher indexSearcher = new IndexSearcher（indexReader）; 
 Query query = new TermQuery（new Term（all，all））; 
 int nDocs = 100; 
 
最终的TopDocs search = indexSearcher.search（query，null，nDocs，sort）; 
 System.out.println（Max+ search.scoreDocs.length ++ search.getMaxScore（））; 
（ScoreDoc sd：search.scoreDocs）{
 Document document = indexReader.document（sd.doc）; 
 System.out.println（document.getField（name）。stringValue（））; 
 
 
 
 $ b private static void addDoc（IndexWriter indexWriter，String name，int descriptionLength，int views，int nameLength）throws IOException {
 Document doc = new Document（）; 
 doc.add（new TextField（name，name，Field.Store.YES））; 
 doc.add（new TextField（all，all，Field.Store.YES））; 
 doc.add（new IntField（descriptionLength，descriptionLength，Field.Store.YES））; 
 doc.add（新的IntField（views.views，views，Field.Store.YES））; 
 doc.add（new IntField（nameLength，nameLength，Field.Store.YES））; 
 indexWriter.addDocument（doc）; 
 
 
 
 
 
 
 代码将输出 
 
 
 分数9 
分数6 
分数5 
分数4 
分数3 
分数1 
分数0 
  
 
I want to serve my users the most relevant and best results. For example, I'm rewarding records that have a big title, description, attached photos, etc. For context: the records are bicycle routes, having routepoints (coordinates) and metadata like photos, reviews, etc. 


Now, I have indexed these records using Hibernate and then I search within the index using Lucene in Hibernate Search. To score my results, I build queries based on the document properties and boost them (using boostedTo()) in a should BooleanJunction clause: 
bj.should(qb.range().onField("descriptionLength").above(3000).createQuery()).boostedTo(3.0f);   
bj.should(qb.range().onField("views.views").above(5000).createQuery()).boostedTo(3.0f);     
bj.should(qb.range().onField("nameLength").above(20).createQuery()).boostedTo(1.0f);     
bj.should(qb.range().onField("picturesLength").above(0).createQuery()).boostedTo(5.0f);
bj.should(qb.keyword().onField("routePoints.poi.participant").matching("true").createQuery()).boostedTo(10.0f);
To try and disable Lucene's scoring, I have overridden the DefaultSimilarity class, set all the comparing to 1.0f score and enabled it via Hibernate config:
public class IgnoreScoringSimilarity extends DefaultSimilarity {
    @Override
    public float idf(long docFreq, long numDocs) {
        return 1.0f;
    }

    @Override
    public float tf(float freq) {
        return 1.0f;
    }

    @Override
    public float coord(int overlap, int maxOverlap) {
        return 1.0f;
    }

    @Override
    public float lengthNorm(FieldInvertState state) {
        return 1.0f;
    }

    @Override
    public float queryNorm(float sumOfSquaredWeights) {
        return 1.0f;
    }
} 
Hibernate config:
<property name="hibernate.search.default.similarity" value="com.search.IgnoreScoringSimilarity"/>
This approach works for 90% of the time, however, I am still seeing some weird results that seem to be out of place. The pattern I recognize is that these routes (documents) are very large in size. A normal route has about 20-30 routepoints, however these out-of-place results have 100-150. This leaves me to believe that default Lucene scoring is still happening (scoring higher because of document size). 

Am I doing something wrong in disabling Lucene's scoring? Could there be another explanation?  
 解决方案 
I can suggest another approach based on custom result sorting. You can read about it in the answer. This answer is a slightly outdated, so I modified this example according to Lucene API 4.10.1. Comparator  
public abstract class CustomComparator extends FieldComparator<Double> {
    double[] scoring;
    double bottom;
    double topValue;
    private FieldCache.Ints[] currentReaderValues;
    private String[] fields;

    protected abstract double getScore(int[] value);

    public CustomComparator(int hitNum, String[] fields) {
        this.fields = fields;
        scoring = new double[hitNum];
    }

    int[] fromReaders(int doc) {
        int[] result = new int[currentReaderValues.length];
        for (int i = 0; i < result.length; i++) {
            result[i] = currentReaderValues[i].get(doc);
        }
        return result;
    }

    @Override
    public int compare(int slot1, int slot2) {
        return Double.compare(scoring[slot1], scoring[slot2]);
    }

    @Override
    public void setBottom(int slot) {
        this.bottom = scoring[slot];
    }

    @Override
    public void setTopValue(Double top) {
        topValue = top;
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        double v2 = getScore(fromReaders(doc));
        return Double.compare(bottom, v2);
    }

    @Override
    public int compareTop(int doc) throws IOException {
        double docValue = getScore(fromReaders(doc));
        return Double.compare(topValue, docValue);
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
        scoring[slot] = getScore(fromReaders(doc));
    }

    @Override
    public FieldComparator<Double> setNextReader(AtomicReaderContext atomicReaderContext) throws IOException {
        currentReaderValues = new FieldCache.Ints[fields.length];
        for (int i = 0; i < fields.length; i++) {
            currentReaderValues[i] = FieldCache.DEFAULT.getInts(atomicReaderContext.reader(), fields[i], null, false);
        }
        return this;
    }

    @Override
    public Double value(int slot) {
        return scoring[slot];
    }
}
Example of search 
public class SortExample {

    public static void main(String[] args) throws IOException {

        final String[] fields = new String[]{"descriptionLength", "views.views", "nameLength"};

        Sort sort = new Sort(
                new SortField(
                        "",
                        new FieldComparatorSource() {
                            public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                                return new CustomComparator(numHits, fields) {
                                    @Override
                                    protected double getScore(int[] value) {
                                        int descriptionLength = value[0];
                                        int views = value[1];
                                        int nameLength = value[2];
                                        return -((descriptionLength > 2000.0 ? 5.0 : 0.0) +
                                                (views > 5000.0 ? 3.0 : 0.0) +
                                                (nameLength > 20.0 ? 1.0 : 0.0));
                                    }
                                };
                            }
                        }
                )
        );

        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_4_10_4, new StandardAnalyzer());
        Directory directory = new RAMDirectory();
        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

        addDoc(indexWriter, "score 0", 1000, 1000, 10);
        addDoc(indexWriter, "score 5", 3000, 1000, 10);
        addDoc(indexWriter, "score 3", 1000, 6000, 10);
        addDoc(indexWriter, "score 1", 1000, 1000, 30);
        addDoc(indexWriter, "score 4", 1000, 6000, 30);
        addDoc(indexWriter, "score 6", 5000, 1000, 30);
        addDoc(indexWriter, "score 9", 5000, 6000, 30);

        final IndexReader indexReader = DirectoryReader.open(indexWriter, false);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        Query query = new TermQuery(new Term("all", "all"));
        int nDocs = 100;

        final TopDocs search = indexSearcher.search(query, null, nDocs, sort);
        System.out.println("Max " + search.scoreDocs.length + " " + search.getMaxScore());
        for (ScoreDoc sd : search.scoreDocs) {
            Document document = indexReader.document(sd.doc);
            System.out.println(document.getField("name").stringValue());
        }

    }

    private static void addDoc(IndexWriter indexWriter, String name, int descriptionLength, int views, int nameLength) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new TextField("all", "all", Field.Store.YES));
        doc.add(new IntField("descriptionLength", descriptionLength, Field.Store.YES));
        doc.add(new IntField("views.views", views, Field.Store.YES));
        doc.add(new IntField("nameLength", nameLength, Field.Store.YES));
        indexWriter.addDocument(doc);
    }
}
Code will output 
score 9
score 6
score 5
score 4
score 3
score 1
score 0


                        
这篇关于如何在Hibernate Search / Lucene中禁用默认评分/提升？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Hibernate Search / Lucene中禁用默认评分/提升？ [英] How to disable default scoring/boosting in Hibernate Search/Lucene?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何在Hibernate Search / Lucene中禁用默认评分/提升？ [英] How to disable default scoring/boosting in Hibernate Search/Lucene?

问题描述

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭