如何在 NLTK 中获取同义词集的 wordnet 感知频率? [英] How to get the wordnet sense frequency of a synset in NLTK?

查看:27
本文介绍了如何在 NLTK 中获取同义词集的 wordnet 感知频率?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

根据文档,我可以在 nltk 中加载带有感知标记的语料库:

<预><代码>>>>从 nltk.corpus 导入 wordnet_ic>>>brown_ic = wordnet_ic.ic('ic-brown.dat')>>>semcor_ic = wordnet_ic.ic('ic-semcor.dat')

我也可以得到definitionposoffsetexamples,例如:

<预><代码>>>>wn.synset('dog.n.01').例子>>>wn.synset('dog.n.01').定义

但是如何从语料库中获取同义词集的频率?分解问题:

  1. 首先如何计算同义词集出现在带有语义标签的语料库中的次数?
  2. 然后下一步是将计数除以给定特定引理的所有同义词集出现的计数总数.

解决方案

我设法做到了这一点.

from nltk.corpus import wordnet as wn词 = "狗"synsets = wn.synsets(word)sense2freq = {}对于同义词中的 s:频率 = 0对于 s.lemmas 中的引理:freq+=lemma.count()sense2freq[s.offset+"-"+s.pos] = 频率对于 s in sense2freq:打印 s, sense2freq[s]

According to the documentation i can load a sense tagged corpus in nltk as such:

>>> from nltk.corpus import wordnet_ic
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')

I can also get the definition, pos, offset, examples as such:

>>> wn.synset('dog.n.01').examples
>>> wn.synset('dog.n.01').definition

But how can get the frequency of a synset from a corpus? To break down the question:

  1. first how to count many times did a synset occurs a sense-tagged corpus?
  2. then the next step is to divide by the the count by the total number of counts for all synsets occurrences given the particular lemma.

解决方案

I managed to do it this way.

from nltk.corpus import wordnet as wn

word = "dog"
synsets = wn.synsets(word)

sense2freq = {}
for s in synsets:
  freq = 0  
  for lemma in s.lemmas:
    freq+=lemma.count()
  sense2freq[s.offset+"-"+s.pos] = freq

for s in sense2freq:
  print s, sense2freq[s]

这篇关于如何在 NLTK 中获取同义词集的 wordnet 感知频率?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆