如何在Hibernate Search中使用Wordnet同义词? [英] How to use Wordnet Synonyms with Hibernate Search?

查看:69
本文介绍了如何在Hibernate Search中使用Wordnet同义词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试找出如何将WordNet同义词与我正在开发的使用Hibernate Search 5.6.1的搜索功能一起使用.最初,我考虑过使用Hibernate Search批注:

I've been trying to figure out how to use WordNet synonyms with a search function I'm developing which uses Hibernate Search 5.6.1. At first, I thought about using Hibernate Search annotations:

@TokenFilterDef(factory = SynonymFilterFactory.class, params = {@Parameter(name = "ignoreCase", value = "true"),
  @Parameter(name = "expand", value = "true"),@Parameter(name = "synonyms", value = "synonymsfile") })

但是,这需要使用同义词填充的实际文件.从WordNet,我只能获取".pl"文件.因此,我尝试手动制作一个SynonymAnalyzer类,该类将从".pl"文件中读取:

However, this requires an actual file populated with synonyms. From WordNet I was only able to get ".pl" files. So I tried manually making a SynonymAnalyzer class which would read from the ".pl" file:

public class SynonymAnalyzer extends Analyzer {

@Override
protected TokenStreamComponents createComponents(String fieldName) {
  final Tokenizer source = new StandardTokenizer();
  TokenStream result = new StandardFilter(source);
  result = new LowerCaseFilter(result);

  SynonymMap wordnetSynonyms = null;

  try {
    wordnetSynonyms = loadSynonyms();
  } catch (IOException e) {
    e.printStackTrace();
  }
  result = new SynonymFilter(result, wordnetSynonyms, false);
  result = new StopFilter(result, StopAnalyzer.ENGLISH_STOP_WORDS_SET);
  return new TokenStreamComponents(source, result);
}

private SynonymMap loadSynonyms() throws IOException {
  File file = new File("synonyms\\wn_s.pl");
  InputStream stream = new FileInputStream(file);
  Reader reader = new InputStreamReader(stream);
  SynonymMap.Builder parser = null;
  parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(CharArraySet.EMPTY_SET));
  try {
    ((WordnetSynonymParser) parser).parse(reader);
  }   catch (ParseException e) {
    e.printStackTrace();
  }

  return parser.build();
}

}

此方法的问题是我正在获取java.lang.OutOfMemoryError,我认为这是因为同义词或东西过多?进行此操作的正确方法是什么,我在网上看到的所有地方都建议使用WordNet,但似乎找不到使用Hibernate Search Annotations的示例.感谢您的任何帮助,谢谢!

The problem with this method is that I'm getting java.lang.OutOfMemoryError which I'm assuming is because there's too many synonyms or something? What is the proper way to do this, everywhere I've looked online has suggested using WordNet but I can't seem to find an example with Hibernate Search Annotations. Any help is appreciated, thanks!

推荐答案

SynonymFilterFactory 实际上支持wordnet格式.您只是在注释配置中缺少了"format"参数.默认情况下,工厂使用Solr格式.

The wordnet format is actually supported by SynonymFilterFactory. You're simply missing the "format" parameter in your annotation configuration; by default, the factory uses the Solr format.

将注释更改为此:

@TokenFilterDef(
    factory = SynonymFilterFactory.class,
    params = {
        @Parameter(name = "ignoreCase", value = "true"),
        @Parameter(name = "expand", value = "true"),
        @Parameter(name = "synonyms", value = "synonymsfile"),
        @Parameter(name = "format", value = "wordnet") // Add this
    }
)

此外,请确保"synonyms"参数的值是类路径中文件的路径(例如,"com/acme/synonyms.pl",如果文件位于目录中,则仅仅是"synonyms.pl")资源"目录的根目录.

Also, make sure that the value of the "synonyms" parameter is the path of a file in your classpath (e.g. "com/acme/synonyms.pl", or just "synonyms.pl" if the file is at the root of your "resources" directory).

通常,当您对Lucene过滤器/令牌器工厂的参数有疑问时,最好的选择是查看该工厂的源代码,或者查看

In general when you have an issue with the parameters of a Lucene filter/tokenizer factory, your best bet is having a look at the source code of that factory, or having a look at this page.

这篇关于如何在Hibernate Search中使用Wordnet同义词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆