通过使用分析器的弹性搜索上的Lucene/Hibernate搜索查询,Typeahead搜索不适用于多个单词(带空格) [英] Typeahead search not working for multiple words(with space) through Lucene/Hibernate Search query on Elastic search with analyzers

查看:73
本文介绍了通过使用分析器的弹性搜索上的Lucene/Hibernate搜索查询,Typeahead搜索不适用于多个单词(带空格)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我的配置和代码.基本上,我正尝试通过预输入搜索从ES索引中获取记录.尽管单字搜索按预期工作,但仅在单个字段上,多字搜索根本无法工作.

Below is my configurations and code. Basically I'm trying to fetch record from ES Indexes with typeahead search. Though single word search is working as expected, but only on a single field, the multi word search is not working at all.

我的要求是在多个字段中搜索单词后,根据搜索到的单词获取记录.例如,如果我搜索名称"Jason K Smith",则查询应该在所有字段(名称,地址,姓氏,姓氏等)上运行,因为搜索到的文本可能在多个字段中.另外,如果我搜索两个名称,例如"Mike John",则结果应包含两个名称的记录(我认为这是可能的,我可能是错误的).

My requirement is to fetch record basis on my searched word after searching for it on multiple fields. For example, if I search a name "Jason K Smith", the query should run on all the fields(name, address, second name, last name and so on...) as the searched text could be in multiple fields. Also, if I search for two names like "Mike John" the result should contain records for both the names(this I feel is possible, I may be wrong).

以下是我的代码:

hibernate.cfg.xml

<property name="hibernate.search.default.indexmanager">elasticsearch</property>
<property name="hibernate.search.default.elasticsearch.host">http://127.0.0.1:9200</property>
<property name="hibernate.search.default.elasticsearch.index_schema_management_strategy">drop-and-create</property>
<property name="hibernate.search.default.elasticsearch.required_index_status">yellow</property>

实体类

@Entity
@Indexed
public class MYClass {
    private DBAccessStatus dBAccessStatus;
    private String optname = "";
    private String phone1 = "";
   @Fields({
      @Field(name = "clientname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
      @Field(name = "edgeNGramClientname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
      @Field(name = "nGramClientname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
    })
private String clientname = "";
@Fields({
      @Field(name = "firstname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
      @Field(name = "edgeNGramFirstName", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
      @Field(name = "nGramFirstName", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
    })
private String firstname = "";
@Fields({
      @Field(name = "midname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
      @Field(name = "edgeNGramMidname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
      @Field(name = "nGramMidname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
    })
private String midname = "";

private String prefixnm = "";

private String suffixnm = "";
@Fields({
      @Field(name = "longname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "standardAnalyzer")),
      @Field(name = "edgeNGramLongname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteEdgeAnalyzer")),
      @Field(name = "nGramLongname", index = Index.YES, store = Store.YES,
    analyze = Analyze.YES, analyzer = @Analyzer(definition = "autocompleteNGramAnalyzer"))
    })
private String longname = "";

分析仪定义

@AnalyzerDefs({

        @AnalyzerDef(name = "autocompleteEdgeAnalyzer",

// Split input into tokens according to tokenizer
                tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class),

                filters = {
                        // Normalize token text to lowercase, as the user is unlikely to
                        // care about casing when searching for matches
                        @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                                @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
                                @Parameter(name = "replacement", value = " "),
                                @Parameter(name = "replace", value = "all") }),
                        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                        @TokenFilterDef(factory = StopFilterFactory.class),
                        // Index partial words starting at the front, so we can provide
                        // Autocomplete functionality
                        @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
                                @Parameter(name = "minGramSize", value = "3"),
                                @Parameter(name = "maxGramSize", value = "50") }) }),

        @AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
                tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

                filters = {
                        // Normalize token text to lowercase, as the user is unlikely to
                        // care about casing when searching for matches
                        @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
                        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                        @TokenFilterDef(factory = NGramFilterFactory.class, params = {
                                @Parameter(name = "minGramSize", value = "3"),
                                @Parameter(name = "maxGramSize", value = "5") }),
                        @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                                @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
                                @Parameter(name = "replacement", value = " "),
                                @Parameter(name = "replace", value = "all") }) }),

        @AnalyzerDef(name = "standardAnalyzer",

// Split input into tokens according to tokenizer
                tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

                filters = {
                        // Normalize token text to lowercase, as the user is unlikely to
                        // care about casing when searching for matches
                        @TokenFilterDef(factory = WordDelimiterFilterFactory.class),
                        @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                        @TokenFilterDef(factory = PatternReplaceFilterFactory.class, params = {
                                @Parameter(name = "pattern", value = "([^a-zA-Z0-9\\.])"),
                                @Parameter(name = "replacement", value = " "),
                                @Parameter(name = "replace", value = "all") }) }),          
        @AnalyzerDef(name = "textanalyzer", tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = {
                @TokenFilterDef(factory = LowerCaseFilterFactory.class),
                @TokenFilterDef(factory = SnowballPorterFilterFactory.class, params = {
                        @Parameter(name = "language", value = "English") }) }) // Def
})

搜索结果示例

 {
        "_index" : "com.csc.pt.svc.data.to.bascltj001to",
        "_type" : "com.csc.pt.svc.data.to.Bascltj001TO",
        "_id" : "44,13",
        "_score" : 1.0,
        "_source" : {
          "id" : "44,13",
          "cltseqnum" : 44,
          "addrseqnum" : "13",
          "clientname" : "Thompsan 1",
          "edgeNGramClientname" : "Thompsan 1",
          "nGramClientname" : "Thompsan 1",
          "firstname" : "Robert",
          "edgeNGramFirstName" : "Robert",
          "nGramFirstName" : "Robert",
          "longname" : "Robert Thompsan",
          "edgeNGramLongname" : "Robert Thompsan",
          "nGramLongname" : "Robert Thompsan",
          "addrln1" : "1 Main Street",
          "edgeNGramAddrln1" : "1 Main Street",
          "nGramAddrln1" : "1 Main Street",
          "city" : "Columbia",
          "edgeNGramCity" : "Columbia",
          "nGramCity" : "Columbia",
          "state" : "SC",
          "edgeNGramState" : "SC",
          "nGramState" : "SC",
          "zipcode" : "29224",
          "edgeNGramZipcode" : "29224",
          "nGramZipcode" : "29224",
          "country" : "USA",
          "edgeNGramCountry" : "USA",
          "nGramCountry" : "USA"
        }
      },

当前应用的代码:

protected static final String FIRSTNAME_EDGE_NGRAM_INDEX = "edgeNGramFirstName";
    protected static final String FIRSTNAME_NGRAM_INDEX = "nGramFirstName";
    protected static final String MIDNAME_EDGE_NGRAM_INDEX = "edgeNGramMidname";
    protected static final String MIDNAME_NGRAM_INDEX = "nGramMidname";
    protected static final String PHONE1_EDGE_NGRAM_INDEX = "edgeNGramPhone1";
    protected static final String PHONE1_NGRAM_INDEX = "nGramPhone1";
    protected static final String LONGNAME_EDGE_NGRAM_INDEX = "edgeNGramLongname";
    protected static final String LONGNAME_NGRAM_INDEX = "nGramLongname";
    protected static final String CLIENT_EDGE_NGRAM_INDEX = "edgeNGramClientname";
    protected static final String CLIENT_NGRAM_INDEX = "nGramClientname";

    protected static final String ADDRLN1_EDGE_NGRAM_INDEX = "edgeNGramAddrln1";
    protected static final String ADDRLN1_NGRAM_INDEX = "nGramAddrln1";
    protected static final String ADDRLN2_EDGE_NGRAM_INDEX = "edgeNGramAddrln2";
    protected static final String ADDRLN2_NGRAM_INDEX = "nGramAddrln2";
    protected static final String ADDRLN3_EDGE_NGRAM_INDEX = "edgeNGramAddrln3";
    protected static final String ADDRLN3_NGRAM_INDEX = "nGramAddrln3";
    protected static final String ADDRLN4_EDGE_NGRAM_INDEX = "edgeNGramAddrln4";
    protected static final String ADDRLN4_NGRAM_INDEX = "nGramAddrln4";
    protected static final String CITY_EDGE_NGRAM_INDEX = "edgeNGramCity";
    protected static final String CITY_NGRAM_INDEX = "nGramCity";
    protected static final String STATE_EDGE_NGRAM_INDEX = "edgeNGramState";
    protected static final String STATE_NGRAM_INDEX = "nGramState";
    protected static final String COUNTRY_EDGE_NGRAM_INDEX = "edgeNGramCountry";
    protected static final String COUNTRY_NGRAM_INDEX = "nGramCountry";



protected void getClt0100Data(){
        Query query = queryBuilder.phrase().withSlop(5).
                 onField(FIRSTNAME_EDGE_NGRAM_INDEX).andField(FIRSTNAME_NGRAM_INDEX)
                .andField(MIDNAME_EDGE_NGRAM_INDEX).andField(MIDNAME_NGRAM_INDEX)
                .andField(LONGNAME_EDGE_NGRAM_INDEX).andField(LONGNAME_NGRAM_INDEX)
                .andField(CLIENT_EDGE_NGRAM_INDEX).andField(CLIENT_NGRAM_INDEX)
                .andField(ADDRLN1_EDGE_NGRAM_INDEX).andField(ADDRLN1_NGRAM_INDEX)
                .andField(ADDRLN2_EDGE_NGRAM_INDEX).andField(ADDRLN2_NGRAM_INDEX)
                .andField(ADDRLN3_EDGE_NGRAM_INDEX).andField(ADDRLN3_NGRAM_INDEX)
                .andField(ADDRLN4_EDGE_NGRAM_INDEX).andField(ADDRLN4_NGRAM_INDEX)
                .andField(CITY_EDGE_NGRAM_INDEX).andField(CITY_NGRAM_INDEX)
                .andField(STATE_EDGE_NGRAM_INDEX).andField(STATE_NGRAM_INDEX)
                .andField(COUNTRY_EDGE_NGRAM_INDEX).andField(COUNTRY_NGRAM_INDEX)
                .boostedTo(5).sentence(this.data.getSearchText().toLowerCase()).createQuery();


        FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(query, Bascltj001TO.class);
        fullTextQuery.setMaxResults(this.data.getPageSize()).setFirstResult(this.data.getPageSize())
        .setProjection("longname, firstname", "cltseqnum", "midname", "clientname", "addrln1","addrln2","addrln3","addrln4","city","state","zipcode", "country")
        .setResultTransformer( new BasicTransformerAdapter() {
            @Override
            public Cltj001ElasticSearchResponseTO transformTuple(Object[] tuple, String[] aliases) {
                return new Cltj001ElasticSearchResponseTO((String) tuple[0], (String) tuple[1], (long) tuple[2], (String) tuple[3], (String) tuple[4],
                        (String) tuple[5],(String) tuple[6],(String) tuple[7],(String) tuple[8],(String) tuple[9], (String) tuple[10], (String) tuple[11], (String) tuple[12]);
            }
        });

        resultsClt0100List = fullTextQuery.getResultList();
    } 

推荐答案

您正在做的事情很奇怪.

What you're doing is weird.

我不明白为什么最后要进行短语搜索时使用ngram的原因.我认为效果不太好.

I don't see why you use ngram if in the end you want to do a phrase search. I don't think that will work very well.

我认为简单的查询字符串更适合您的需求: https://docs.jboss.org/hibernate/search/5.8/reference/zh-CN/html_single/#_simple_query_string_queries .

I think simple query strings are more what you're looking for: https://docs.jboss.org/hibernate/search/5.8/reference/en-US/html_single/#_simple_query_string_queries .

但是再说一次,您到处都在使用ngram,而您所描述的所需功能实际上并不需要ngram,因为您似乎希望进行精确的搜索.

But then again, you're using ngram everywhere whereas what you describe as the feature you want doesn't really need ngrams as it seems you're expecting an exact search.

我建议您从简单开始,并使用分析仪删除重音并降低文本的大小写并使之生效.

I would recommend you to start simple and use an analyzer removing the accents and lowercasing the text and make it work.

然后,如果您真的想要某种模糊搜索,请考虑使用ngram.

And then consider ngrams if you really want some sort of fuzzy search.

这篇关于通过使用分析器的弹性搜索上的Lucene/Hibernate搜索查询,Typeahead搜索不适用于多个单词(带空格)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆