Java中的Wordnet相似性:JAWS,JWNL还是Java WN ::相似性? [英] Wordnet Similarity in Java: JAWS, JWNL or Java WN::Similarity?

查看:200
本文介绍了Java中的Wordnet相似性:JAWS,JWNL还是Java WN ::相似性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在基于java的应用程序中使用Wordnet。
我想:

I need to use Wordnet in a java-based app. I want to:


  • 搜索同义词

  • search synsets

查找synsets之间的相似性/相关性

find similarity/relatedness between synsets

我的应用使用RDF图,我知道有Wordnet的SPARQL端点,但我想最好有一份数据集的本地副本,因为它不是太大。

My app uses RDF graphs and I know there are SPARQL endpoints with Wordnet, but I guess it's better to have a local copy of the dataset, as it's not too big.

我找到了以下的罐子:

  • General library - JAWS http://lyle.smu.edu/~tspell/jaws/index.html
  • General library - JWNL http://sourceforge.net/projects/jwordnet
  • Similarity library (Perl) - Wordnet::similarity http://wn-similarity.sourceforge.net/
  • Java version of Wordnet::similarity http://www.cogs.susx.ac.uk/users/drh21/ (beta)

你会为我的应用推荐什么?

What would you recommend for my app?

是否可以通过一些绑定从Java应用程序中使用Perl库?

Is it possible to use a Perl library from a java app via some bindings?

谢谢!
Mulone

Thanks! Mulone

推荐答案

我将JAWS用于普通的wordnet内容,因为它易于使用。对于相似性的度量,但是,我使用位于库这里。您还需要下载这个文件夹,包含预处理的WordNet和语料库数据,以便它工作。代码可以这样使用,假设您将该文件夹放在项目文件夹中另一个名为lib的文件夹中:

I use JAWS for normal wordnet stuff because it's easy to use. For similarity metrics, though, I use the library located here. You'll also need to download this folder, containing pre-processed WordNet and corpus data, for it to work. The code can be used like this, assuming you placed that folder in another called "lib" in your project folder:

JWS ws = new JWS("./lib", "3.0");
Resnik res = ws.getResnik();
TreeMap<String, Double> scores1 = res.res(word1, word2, partOfSpeech);
for(Entry<String, Double> e: scores1.entrySet())
    System.out.println(e.getKey() + "\t" + e.getValue());
System.out.println("\nhighest score\t=\t" + res.max(word1, word2, partOfSpeech) + "\n\n\n");

这将打印如下内容,显示每个可能的同义词组合之间的相似性得分。要比较的词:

This will print something like the following, showing the similarity score between each possible combination of synsets represented by the words to be compared:

hobby#n#1,gardening#n#1 2.6043996588901104
hobby#n#2,gardening#n#1 -0.0
hobby#n#3,gardening#n#1 -0.0
highest score   =   2.6043996588901104

还有一些方法可以指定任何一个/两个词的含义: res(String word1,int senseNum1,String word2,partOfSpeech)等。遗憾的是,源文档不是JavaDoc,因此您需要手动检查它。该来源可以在这里下载。

There are also methods that allow you to specify which sense of either/both words: res(String word1, int senseNum1, String word2, partOfSpeech), etc. Unfortunately, the source documentation is not JavaDoc, so you'll need to inspect it manually. The source can be downloaded here.

可用的算法是:

JWSRandom(ws.getDictionary(), true, 16.0);//random number for baseline
Resnik res = ws.getResnik();
LeacockAndChodorowlch = ws.getLeacockAndChodorow();
AdaptedLesk adLesk = ws.getAdaptedLesk();
AdaptedLeskTanimoto alt = ws.getAdaptedLeskTanimoto();
AdaptedLeskTanimotoNoHyponyms altnh = ws.getAdaptedLeskTanimotoNoHyponyms();
HirstAndStOnge hso = ws.getHirstAndStOnge();
JiangAndConrath jcn = ws.getJiangAndConrath();
Lin lin = ws.getLin();
WuAndPalmer wup = ws.getWuAndPalmer();

此外,它要求你有麻省理工学院的jar文件 JWI

Also, it requires you to have the jar file for MIT's JWI

这篇关于Java中的Wordnet相似性:JAWS,JWNL还是Java WN ::相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆