使用NLTK比较术语/表达式的相似性? [英] Compare similarity of terms/expressions using NLTK?

查看:85
本文介绍了使用NLTK比较术语/表达式的相似性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试比较在语义上相关的术语/表达-这些不是完整的句子,不一定是单个单词;例如-

I'm trying to compare terms/expressions which would (or not) be semantically related - these are not full sentences, and not necessarily single words; e.g. -

社交网络服务"和社交网络"显然密切相关,但是我如何使用nltk对此进行量化?

'Social networking service' and 'Social network' are clearly strongly related, but how to i quantify this using nltk?

很明显,我甚至缺少一些代码:

Clearly i'm missing something as even the code:

w1 = wordnet.synsets('social network')

返回一个空列表.

有关如何解决此问题的任何建议?

Any advice on how to tackle this?

推荐答案

存在一些语义相关性或相似性的度量,但最好将它们定义为wordnet词典中的单个单词或单个表达式-而不是wordnet词典条目的复合词据我所知.

There are some measures of semantic relatedness or similarity, but they're better defined for single words or single expressions in wordnet's lexicon - not for compounds of wordnet's lexical entries, as far as I know.

这是一个很好的Web实现,其中包含许多基于wordnet的相似度度量

This is a nice web implementation of many similarity wordnet-based measures

如果您有兴趣,还可以阅读一些有关使用词网相似性(尽管不评估化合物的相似性)来解释化合物的文章:

Some further reading on interpreting compounds using wordnet similarity (although not evaluating similarity on compounds), if you're interested:

  • CiteSeerX (citations are clearer)
  • Same article, PDF

这篇关于使用NLTK比较术语/表达式的相似性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆