布朗语料库在基于WordNet的语义相似度测量中有什么用 [英] What is the use of Brown Corpus in measuring Semantic Similarity based on WordNet

查看:178
本文介绍了布朗语料库在基于WordNet的语义相似度测量中有什么用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了几种使用WordNet的结构和层次结构来衡量语义相似性的方法,例如Jiang和Conrath度量(JNC),Resnik度量(RES),Lin度量(LIN)等.

I came across several methods for measuring semantic similarity that use the structure and hierarchy of WordNet, e.g. Jiang and Conrath measure (JNC), Resnik measure(RES), Lin measure (LIN) etc.

使用NLTK进行测量的方式是:

The way they are measured using NLTK is:

sim2=wn.jcn_similarity(entry1,entry2,brown_ic)
sim3=entry1.res_similarity(entry2, brown_ic)
sim4=entry1.lin_similarity(entry2,brown_ic)

如果WordNet是计算语义相似度的基础,那么布朗语料库在这里有什么用?

If WordNet is the basis of calculating semantic similarity, what is the use of Brown Corpus here?

推荐答案

具体来说,* _ic表示法是信息内容.

Specifically, the *_ic notation is information content.

synset1.res_similarity(synset2,ic):Resnik相似度:返回分数 根据信息表明两个词义有多相似 最小共同消费方(最具体的祖先)的内容(IC) 节点).请注意,对于任何使用信息的相似性度量 内容,其结果取决于用于生成 信息内容以及信息内容的细节 已创建.

synset1.res_similarity(synset2, ic): Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node). Note that for any similarity measure that uses information content, the result is dependent on the corpus used to generate the information content and the specifics of how the information content was created.

有关信息内容的更多信息,来自

A bit more info on information content from here:

测量词义IC的常规方法是结合 从像这样的本体中了解它们的层次结构 WordNet及其在文本中的实际用法统计信息,源于 大语料

The conventional way of measuring the IC of word senses is to combine knowledge of their hierarchical structure from an ontology like WordNet with statistics on their actual usage in text as derived from a large corpus

这篇关于布朗语料库在基于WordNet的语义相似度测量中有什么用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆