在Lucene.Net 3.0.3中对空值进行自定义排序 [英] Custom sorting of null values in Lucene.Net 3.0.3

查看:63
本文介绍了在Lucene.Net 3.0.3中对空值进行自定义排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种对我的Lucene.Net结果进行自定义排序的方法,无论在哪个方向(升序或降序),我都将 null -值(文档中的字段不存在)放在底部

I am looking for a way to custom sort my Lucene.Net results where I place null-values (field does not exist on document) at the bottom no matter the direction (ascending or descending) of the sort.

以下数据总结了情况和所需的结果:

The following data sums up the situation and the wanted results:

data in index    wanted sort result
data             desc    asc
----             ----    ----
100              400     100
400              300     300
null             100     400
300              null    null

我的情况是我有一些产品,但并非所有产品都有价格.升序排序时,我首先要最便宜的产品,而不是没有价格的产品(这是预期的默认行为).没有价格的产品仍应保留在结果中,但是最后,因为按价格排序时这些产品的相关性最低.

My situation is that I have some products where not all products have a price. When sorting ascending, I want the cheapest products first, not the products with no price (as is the expected default behavior). The products with no price should still be in the result, but at the end, since these are least relevant when sorting on price.

我已经用 google 搜索了很多,但我还没有真正找到关于如何在 Lucene.Net 3.0.3 中实现自定义排序的任何答案.

I've looked quite a bit around with google and I haven't really found any answer to how you implement custom sorting in Lucene.Net 3.0.3.

我找到的最好的例子是这个答案,这似乎使我指向了我要寻找的方向.但是答案很旧,它所引用的 ScoreDocComparator 似乎是 FieldComparator 作为替代,但是它的实现似乎比 ScoreDocComparator (很多方法需要实现/重写,并且很多方法可能受益于继承而不是重复的实现)要复杂得多,而我怀疑这是正确的选择吗?

The best example I've found is this answer that seems to point me in the direction I'm looking for. But the answer is old and the ScoreDocComparator it is refering to, seems to be deprecated in the original source, and thereby also in the current version 3.0.3 of Lucene.Net. The original project refers to FieldComparator as replacement, but this seems to be highly more complex to implement than the ScoreDocComparator (a lot of methods that needs to be implemented/overridden and many which could benefit of inheritance instead of duplicate implementations), and I get in doubt that this is the right path to go with?

理想情况下,我想为int/long字段实现某种通用名称,以便在其中考虑诸如SortField对象之类的字段名,因为我希望将来会有更多的字段会受益于这种自定义排序行为.

Ideally I want to implement something generic for int/long fields where it can take fieldname in account like the SortField object, since I expect to have more fields in the future that would benefit of this custom sorting behavior.

我认为实现是在 Sort / SortField 类的用法周围完成的,所以我的最终用法代码可能类似于:

I would think that the implementation is done somewhere around the usage of Sort/SortField class, so my ending usage code could be something like:

var sort = new Sort(new MyNullLastSortField("Price",SortField.INT,反向));

但是也许那也是错误的方式? SortField 有一个构造函数,该构造函数以 FieldComparator 作为参数,但是我似乎无法完全理解它的构造和实现方式以及从中获取实际数据值的位置.索引流入和流出.

But maybe that is also the wrong way? SortField has a constructor which takes a FieldComparator as parameter, but I can't seem to wrap my head around how this is constructed and implemented and where the actual data values from the index flows in and out.

任何帮助我指出正确方向(最好提供示例代码)的人.

Any help pointing me in the right direction (preferably with sample code) is much appreciated.

我的故障转移解决方案(不推荐使用)将向索引中添加仅用于进行排序的两个字段,在插入时手动处理空值,并在降序情况下将其设置为-1,升至9999999.然后按价格和方向具有特定字段名称的字段进行正常排序.

推荐答案

好奇心得到了我的最好评价.这是一个解决方案(有警告)

Curiosity got the best of me. Here's a solution (with caveats)

完整的源代码位于 https://github.com/AndyPook/SO_CustomSort-40744865

用于添加可为null的int的扩展方法.NumericField使用一种编码来存储值,而我不想进入该值,因此我只使用了一个哨兵值.

Extension method to add nullable ints. NumericField uses an encoding to store values, which I didn't want to get into, so I've just used a sentinel value.

public static class NumericFieldExtensions
{
    public static NumericField SetIntValue(this NumericField f, int? value)
    {
        if (value.HasValue)
            f.SetIntValue(value.Value);
        else
            f.SetIntValue(int.MinValue);

        return f;
    }
}

了解"哨兵的自定义竞争者.它只是Lucene的 IntComparator 的副本,该副本已密封,因此可以复制.寻找 int.MinValue 来查看差异.

A custom compatitor which "understands" the sentinel. It's just a copy of lucene's IntComparator which is sealed, hence to copy. Look for int.MinValue to see the differences.

public class NullableIntComparator : FieldComparator
{
    private int[] values;
    private int[] currentReaderValues;
    private string field;
    private IntParser parser;
    private int bottom; // Value of bottom of queue
    private bool reversed;

    public NullableIntComparator(int numHits, string field, Parser parser, bool reversed)
    {
        values = new int[numHits];
        this.field = field;
        this.parser = (IntParser)parser;
        this.reversed = reversed;
    }

    public override int Compare(int slot1, int slot2)
    {
        // TODO: there are sneaky non-branch ways to compute
        // -1/+1/0 sign
        // Cannot return values[slot1] - values[slot2] because that
        // may overflow
        int v1 = values[slot1];
        int v2 = values[slot2];

        if (v1 == int.MinValue)
            return reversed ? -1 : 1;
        if (v2 == int.MinValue)
            return reversed ? 1 : -1;

        if (v1 > v2)
        {
            return 1;
        }
        else if (v1 < v2)
        {
            return -1;
        }
        else
        {
            return 0;
        }
    }

    public override int CompareBottom(int doc)
    {
        if (bottom == int.MinValue)
            return reversed ? -1 : 1;

        // TODO: there are sneaky non-branch ways to compute
        // -1/+1/0 sign
        // Cannot return bottom - values[slot2] because that
        // may overflow
        int v2 = currentReaderValues[doc];

        if (v2 == int.MinValue)
            return reversed ? 1 : -1;

        if (bottom > v2)
        {
            return 1;
        }
        else if (bottom < v2)
        {
            return -1;
        }
        else
        {
            return 0;
        }
    }

    public override void Copy(int slot, int doc)
    {
        values[slot] = currentReaderValues[doc];
    }

    public override void SetNextReader(IndexReader reader, int docBase)
    {
        currentReaderValues = FieldCache_Fields.DEFAULT.GetInts(reader, field, parser);
    }

    public override void SetBottom(int bottom)
    {
        this.bottom = values[bottom];
    }

    public override IComparable this[int slot] => values[slot];
}

最后一个 FieldComparatorSource 定义自定义排序

Lastly a FieldComparatorSource to define the custom sort

public class NullableIntFieldCompatitorSource : FieldComparatorSource
{
    public override FieldComparator NewComparator(string fieldname, int numHits, int sortPos, bool reversed)
    {
        return new NullableIntComparator(numHits, fieldname, FieldCache_Fields.NUMERIC_UTILS_INT_PARSER, reversed);
    }
}

一些测试.查看如何创建 Sort ,以了解如何将其插入.

Some tests. See how the Sort is created for how this plugs together.

    private class DataDoc
    {
        public int ID { get; set; }
        public int? Data { get; set; }
    }

    private IEnumerable<DataDoc> Search(Sort sort)
    {
        var result = searcher.Search(new MatchAllDocsQuery(), null, 99, sort);

        foreach (var topdoc in result.ScoreDocs)
        {
            var doc = searcher.Doc(topdoc.Doc);
            int id = int.Parse(doc.GetFieldable("id").StringValue);
            int data = int.Parse(doc.GetFieldable("data").StringValue);

            yield return new DataDoc
            {
                ID = id,
                Data = data == int.MinValue ? (int?)null : data
            };
        }
    }

    [Fact]
    public void SortAscending()
    {
        var sort = new Sort(new SortField("data", new NullableIntFieldCompatitorSource()));

        var result = Search(sort).ToList();

        Assert.Equal(4, result.Count);
        Assert.Equal(new int?[] { 100, 300, 400, null }, result.Select(x => x.Data));
    }


    [Fact]
    public void SortDecending()
    {
        var sort = new Sort(new SortField("data", new NullableIntFieldCompatitorSource(),true));

        var result = Search(sort).ToList();

        Assert.Equal(4, result.Count);
        Assert.Equal(new int?[] { 400, 300, 100, null }, result.Select(x => x.Data));
    }

注意

  • 每个文档必须都包含一个带有有效int的数据"字段.您不能只是忽略该字段
  • 您需要使 NullableIntFieldCompatitorSource 更加复杂,以便它为您的字段名称返回正确的比较器.
  • 您需要为其他数字类型创建比较器.参见 https://github.com/apache/lucenenet/blob/3.0.3/src/core/Search/FieldComparator.cs
  • 如果不想使用哨兵值,则需要进入 NumericField 并弄清楚如何对 null 进行编码.但这意味着要进入其他几个班级
  • Each doc MUST contain a "data" field with a valid int. You can't just omit the field
  • You'll need to make the NullableIntFieldCompatitorSource more sophisticated so that it returns the correct comparitor for your field names.
  • you'll need to create comparitors for the other numeric types. See https://github.com/apache/lucenenet/blob/3.0.3/src/core/Search/FieldComparator.cs
  • if you don't want to use sentinel values, you'll need to get into NumericField and figure out how to encode null. But that'll mean getting into several other classes

这篇关于在Lucene.Net 3.0.3中对空值进行自定义排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆