构建 Lucene 同义词 [英] Build Lucene Synonyms

查看:22
本文介绍了构建 Lucene 同义词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码

static class TaggerAnalyzer extends Analyzer {

    @Override
    protected TokenStreamComponents createComponents(String s, Reader reader) {

        SynonymMap.Builder builder = new SynonymMap.Builder(true);
        builder.add(new CharsRef("al"), new CharsRef("americanleague"), true);
        builder.add(new CharsRef("al"), new CharsRef("a.l."), true);
        builder.add(new CharsRef("nba"), new CharsRef("national" + SynonymMap.WORD_SEPARATOR + "basketball" + SynonymMap.WORD_SEPARATOR + "association"), true);

        SynonymMap mySynonymMap = null;
        try {
            mySynonymMap = builder.build();
        } catch (IOException e) {
            e.printStackTrace();
        }

        Tokenizer source = new ClassicTokenizer(Version.LUCENE_40, reader);
        TokenStream filter = new StandardFilter(Version.LUCENE_40, source);
        filter = new LowerCaseFilter(Version.LUCENE_40, filter);
        filter = new SynonymFilter(filter, mySynonymMap, true);
        return new TokenStreamComponents(source, filter);
    }
}

我正在运行一些测试,到目前为止,一切正常,直到我弄清楚了这个场景.

And I'm running some test, so far, everything went ok until I figured out this scenario.

    String title = "Very short title at a.l. bla bla"

    Assert.assertTrue(TagUtil.evaluate(memoryIndex,"americanleague"));
    Assert.assertTrue(TagUtil.evaluate(memoryIndex,"al"));

我期待这两个案例都能成功运行,但美国联盟与a.l."不匹配除了a.l."和americanleague"是al"的同义词.

I was expecting that both cases ran successfully, but americanleague didn't match with "a.l." besides both "a.l." and "americanleague" are "al" synonyms.

那么,我该怎么办?我不想将所有组合添加到地图中.谢谢

So, what do I do? I don't want to add all combinations to the Map. Thanks

推荐答案

我相信你对 builder.add 的论点倒退了.试试:

I believe you have your arguments to builder.add backwards. Try:

builder.add(new CharsRef("americanleague"), new CharsRef("al"), true);
builder.add(new CharsRef("a.l."), new CharsRef("al"), true);
builder.add(new CharsRef("national" + SynonymMap.WORD_SEPARATOR + "basketball" + SynonymMap.WORD_SEPARATOR + "association"), new CharsRef("nba"), true);

SynonymFilter 只是从第一个 arg(输入)映射到第二个 arg(输出),而不是相反.所以你有规则将al"翻译成两个不同的同义词,但没有对a.l."的输入做任何事情.或美国联赛".

The SynonymFilter just maps from the first arg (input) to the second arg (output), rather than the other way around. So you have rules to translate "al" to two different synonyms, but none that do anything to inputs of "a.l." or "americanleague".

这篇关于构建 Lucene 同义词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆