如何使用字段值增强休眠搜索查询? [英] How to boost hibernate-search query with field values?

查看：74 发布时间：2020/5/4 7:54:13 lucene hibernate-search

本文介绍了如何使用字段值增强休眠搜索查询?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在实体类中有两个字段:

I have two fields in an entity class:

EstablishmentName
contactType

contactType 具有PBX，GSM，TEL和FAX之类的值

contactType has values like PBX, GSM, TEL and FAX

我想要一种评分机制，以便首先获取最匹配的数据，然后是PBX，TEL，GSM和FAX.

I want a scoring mechanism as to get the most matching data first then PBX, TEL, GSM and FAX.

得分:

在 EstablishmentName 上首先获取最匹配的数据
在 contactType 上先获取PBX，然后再获取TEL等

On establishmentName to get the most matching data first
On contactType to get first PBX then TEL and so on

我的最终查询是:

(+企业名称:kamran〜1 ^ 2.5 +(contactType:PBX ^ 2.0 contactType:TEL ^ 1.8 contactType:GSM ^ 1.6 contactType:FAX ^ 1.4))

(+establishmentName:kamran~1^2.5 +(contactType:PBX^2.0 contactType:TEL^1.8 contactType:GSM^1.6 contactType:FAX^1.4))

但它不返回结果.

我的问题是，如何在不同的值基础上增加特定字段?

我们可以对两个不同的字段使用以下查询:

We can use the following query for two different fields:

Query query = qb.keyword()
    .onField( field_one).boostedTo(2.0f)
    .andField( field_two)
    .matching( searchTerm)
    .createQuery();

但是我需要在其值上增加一个字段，就像我的情况是 contactType .

But i need to boost a field on its values as in my case it is contactType.

我的数据集:
(企业名称:演唱会装饰，联系人类型:GSM)， (机构名称:Elissa Concert，联系人类型:TEL)， (店名:Yara Concert，contactType:FAX)， (店名:E Concept，联系类型:TEL)， (EstablishmentName:Infinity Concept，contactType:FAX)， (名称:SD概念，contactType:PBX)， (企业名称:Broadcom技术概念，contactType:GSM)， (EstablishmentName:概念商人，contactType:PBX)

My dataset:
(establishmentName : Concert Decoration, contactType : GSM), (establishmentName : Elissa Concert, contactType : TEL), (establishmentName : Yara Concert, contactType : FAX), (establishmentName : E Concept, contactType : TEL), (establishmentName : Infinity Concept, contactType : FAX), (establishmentName : SD Concept, contactType : PBX), (establishmentName : Broadcom Technical Concept, contactType : GSM), (establishmentName : Concept Businessmen, contactType : PBX)

通过搜索term = concert(对EstablishmentName进行模糊查询)，它应该向我返回以下列表: (店名:Elissa Concert，contactType:TEL)

By searching the term=concert(fuzzy query on establishmentName), it should return me the list as below: (establishmentName : Elissa Concert, contactType : TEL)

[term = concert，完全匹配，因此通过保持订购为PBX，TEL，GSM和FAX]

[term=concert, exact matching so it will be on top by keeping the order as PBX, TEL, GSM and FAX]

(企业名称:演唱会装饰，contactType:GSM)

(establishmentName : Concert Decoration, contactType : GSM)

[term = concert，完全匹配并保持顺序为PBX，TEL， GSM和FAX]

[term=concert, exact matching and by keeping the order as PBX, TEL, GSM and FAX]

(店名:Yara演唱会，contactType:传真)

(establishmentName : Yara Concert, contactType : FAX)

[term = concert，完全匹配并保持顺序为PBX，TEL， GSM和FAX]

[term=concert, exact matching and by keeping the order as PBX, TEL, GSM and FAX]

(店名:概念商人，contactType:PBX)

(establishmentName : Concept Businessmen, contactType : PBX)

[term = concert，部分匹配并保持顺序为PBX，TEL，GSM 和传真]

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(店名:SD概念，contactType:PBX)

(establishmentName : SD Concept, contactType : PBX)

[term = concert，部分匹配并保持顺序为PBX，TEL，GSM 和传真]

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(企业名称:E Concept，contactType:TEL)

(establishmentName : E Concept, contactType : TEL)

[term = concert，部分匹配并保持顺序为PBX，TEL， GSM和FAX]

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(企业名称:Broadcom技术概念，contactType:GSM)

(establishmentName : Broadcom Technical Concept, contactType : GSM)

[term = concert，部分匹配并保持顺序为PBX，TEL，GSM 和传真]

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

(店名:Infinity概念，contactType:传真)

(establishmentName : Infinity Concept, contactType : FAX)

[term = concert，部分匹配并保持顺序为PBX，TEL，GSM 和传真]

[term=concert, partial matching and keeping the order as PBX, TEL, GSM and FAX]

推荐答案

据我了解，您基本上需要两阶段排序:

From what I understand you basically want a two-phase sort:

将完全匹配项放在其他(模糊)匹配项之前.
按联系人类型排序.

第二种方法很简单，但是第一种方法需要一些工作. 您实际上可以依靠评分来实现它.

The second sort is trivial, but the first one will require a bit of work. You can actually rely on scoring to implement it.

本质上，该想法是对多个查询进行分解，并为每个查询分配恒定的分数.

Essentially the idea would be to run a disjunction of multiple queries, and to assign a constant score to each query.

而不是这样做:

Query query = qb.keyword()
    .fuzzy().withEditDistanceUpTo(1)
    .boostedTo(2.5f)
    .onField("establishmentName")
    .matching(searchTerm)
    .createQuery();

执行此操作:

Query query = qb.bool()
    .should(qb.keyword()
        .withConstantScore().boostedTo(100.0f) // Higher score, sort first
        .onField("establishmentName")
        .matching(searchTerm)
        .createQuery())
    .should(qb.keyword()
        .fuzzy().withEditDistanceUpTo(1)
        .withConstantScore().boostedTo(1.0f) // Lower score, sort last
        .onField("establishmentName")
        .matching(searchTerm)
        .createQuery())
    .createQuery();

匹配的文档将是相同的，但是现在查询将分配可预测的分数:仅模糊匹配为1.0，模糊匹配为101.0(模糊查询为1，精确查询为100)完全匹配.

The matched documents will be the same, but now the query will assign predictable scores: 1.0 for fuzzy-only matches, and 101.0 (1 from the fuzzy query and 100 from the exact query) for exact matches.

这样，您可以定义排序如下:

This way, you can define the sort as follows:

fullTextQuery.setSort(qb.sort()
    .byScore()
    .andByField("contactType")
    .createSort());

这可能不是一个非常优雅或优化的解决方案，但我认为它会起作用.

This may not be a very elegant, or optimized solution, but I think it will work.

要自定义联系人类型的相对顺序，我建议采用另一种方法:使用

To customize the relative order of contact types, I would suggest a different approach: use a custom bridge to index numbers instead of the "PBX"/"TEL"/etc., assigning to each contact type the ordinal you expect. Essentially something like that:

public class Establishment {

@Field(name = "contactType_sort", bridge = @FieldBridge(impl = ContactTypeOrdinalBridge.class))
private ContactType contactType;

}

public class ContactTypeOrdinalBridge implements MetadataProvidingFieldBridge {

    @Override
    public void set(String name, Object value, Document document, LuceneOptions luceneOptions) {
        if ( value != null ) {
          int ordinal = getOrdinal((ContactType) value);
          luceneOptions.addNumericFieldToDocument(name, ordinal, document);
          luceneOptions.addNumericDocValuesFieldToDocument(name, ordinal, document);
        }
    }


    @Override
    public void configureFieldMetadata(String name, FieldMetadataBuilder builder) {
        builder.field(name, FieldType.INTEGER).sortable(true);
    }

    private int getOrdinal(ContactType value) {
        switch( value ) {
          case PBX: return 0;
          case TEL: return 1;
          case GSM: return 2;
          case PBX: return 3;
          default: return 4;
        }
    }
}

然后重新编制索引，并进行如下排序:

Then reindex, and sort like this:

fullTextQuery.setSort(qb.sort()
    .byScore()
    .andByField("contactType_sort")
    .createSort());

这篇关于如何使用字段值增强休眠搜索查询?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用字段值增强休眠搜索查询? [英] How to boost hibernate-search query with field values?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用字段值增强休眠搜索查询? [英] How to boost hibernate-search query with field values?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭