如何使用加权函数对多个字段的搜索结果进行排序? [英] How to sort search results on multiple fields using a weighting function?

查看:128
本文介绍了如何使用加权函数对多个字段的搜索结果进行排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Lucene索引,其中每个文档都有几个包含数值的字段。现在我想根据该字段的加权和对搜索结果进行排序。
例如:

I have a Lucene index where every document has several fields which contain numeric values. Now I would like to sort the search result on a weighted sum of this field. For example:

field1=100
field2=002
field3=014

加权函数如下所示:

f(d) = field1 * 0.5 + field2 * 1.4 + field3 * 1.8

结果应按f(d)排序,其中d代表文件。排序功能应该是非静态的,并且可能因搜索到搜索而不同,因为常数因素会受到执行搜索的用户的影响。

The results should be ordered by f(d) where d represents the document. The sorting function should be non-static and could differ from search to search because the constant factors are influenced by the user who performs the search.

有谁知道如何解决这个问题或者想知道如何以另一种方式实现这个目标?

Has anyone an idea how to solve this or maybe an idea how to accomplish this goal in another way?

推荐答案

您可以尝试实现自定义 ScoreDocComparator 。例如:

You could try implementing a custom ScoreDocComparator. For example:

public class ScaledScoreDocComparator implements ScoreDocComparator {

    private int[][] values;
    private float[] scalars;

    public ScaledScoreDocComparator(IndexReader reader, String[] fields, float[] scalars) throws IOException {
        this.scalars = scalars;
        this.values = new int[fields.length][];
        for (int i = 0; i < values.length; i++) {
            this.values[i] = FieldCache.DEFAULT.getInts(reader, fields[i]);
        }
    }

    protected float score(ScoreDoc scoreDoc) {
        int doc = scoreDoc.doc;

        float score = 0;
        for (int i = 0; i < values.length; i++) {
            int value = values[i][doc];
            float scalar = scalars[i];
            score += (value * scalar);
        }
        return score;
    }

    @Override
    public int compare(ScoreDoc i, ScoreDoc j) {
        float iScore = score(i);
        float jScore = score(j);
        return Float.compare(iScore, jScore);
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> sortValue(ScoreDoc i) {
        float score = score(i);
        return Float.valueOf(score);
    }

}

以下是<$的示例c $ c> ScaledScoreDocComparator 正在运作中。我相信它适用于我的测试,但我鼓励您根据您的数据证明它。

Here is an example of ScaledScoreDocComparator in action. I believe it works in my test, but I encourage you to prove it against your data.

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new ScaledScoreDocComparator(reader, fields, scalars);
            }
        }
    )
);

IndexSearcher indexSearcher = ...;
Query query = ...;
Filter filter = ...; // can be null
int nDocs = 100;

TopFieldDocs topFieldDocs = indexSearcher.search(query, filter, nDocs, sort);
ScoreDoc[] scoreDocs = topFieldDocs.scoreDocs;



奖金!



看来, Lucene开发人员正在弃用 ScoreDocComparator 接口(它目前在Subversion存储库中已弃用)。以下是 ScaledScoreDocComparator 的示例,其修改为遵守 ScoreDocComparator 的后继者, FieldComparator

Bonus!

It appears that the Lucene developers are deprecating the ScoreDocComparator interface (it's currently deprecated in the Subversion repository). Here is an example of the ScaledScoreDocComparator modified to adhere to ScoreDocComparator's successor, FieldComparator:

public class ScaledComparator extends FieldComparator {

    private String[] fields;
    private float[] scalars;
    private int[][] slotValues;
    private int[][] currentReaderValues;
    private int bottomSlot;

    public ScaledComparator(int numHits, String[] fields, float[] scalars) {
        this.fields = fields;
        this.scalars = scalars;

        this.slotValues = new int[this.fields.length][];
        for (int fieldIndex = 0; fieldIndex < this.fields.length; fieldIndex++) {
            this.slotValues[fieldIndex] = new int[numHits];
        }

        this.currentReaderValues = new int[this.fields.length][];
    }

    protected float score(int[][] values, int secondaryIndex) {
        float score = 0;

        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            int value = values[fieldIndex][secondaryIndex];
            float scalar = scalars[fieldIndex];
            score += (value * scalar);
        }

        return score;
    }

    protected float scoreSlot(int slot) {
        return score(slotValues, slot);
    }

    protected float scoreDoc(int doc) {
        return score(currentReaderValues, doc);
    }

    @Override
    public int compare(int slot1, int slot2) {
        float score1 = scoreSlot(slot1);
        float score2 = scoreSlot(slot2);
        return Float.compare(score1, score2);
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        float bottomScore = scoreSlot(bottomSlot);
        float docScore = scoreDoc(doc);
        return Float.compare(bottomScore, docScore);
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            slotValues[fieldIndex][slot] = currentReaderValues[fieldIndex][doc];
        }
    }

    @Override
    public void setBottom(int slot) {
        bottomSlot = slot;
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase, int numSlotsFull) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            String field = fields[fieldIndex];
            currentReaderValues[fieldIndex] = FieldCache.DEFAULT.getInts(reader, field);
        }
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> value(int slot) {
        float score = scoreSlot(slot);
        return Float.valueOf(score);
    }

}

使用这个新类非常相似到原来的,除了 sort 对象的定义有点不同:

Using this new class is very similar to the original, except that the definition of the sort object is a bit different:

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new FieldComparatorSource() {
            public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                return new ScaledComparator(numHits, fields, scalars);
            }
        }
    )
);

这篇关于如何使用加权函数对多个字段的搜索结果进行排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆