如何在NLTK的Wordnet中检索目标同义词集的反义词同义词集? [英] How can I retrieve the antonym synset of a target synset in NLTK's Wordnet?

查看:156
本文介绍了如何在NLTK的Wordnet中检索目标同义词集的反义词同义词集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功检索了通过其他语义关系连接到基本同义词集的同义词集,如下所示:

 wn.synset('good.a.01').also_sees()
 Out[63]: 
 [Synset('best.a.01'),
 Synset('better.a.01'),
 Synset('favorable.a.01'),
 Synset('good.a.03'),
 Synset('obedient.a.01'),
 Synset('respectable.a.01')]

wn.synset('good.a.01').similar_tos()
Out[64]: 
[Synset('bang-up.s.01'),
 Synset('good_enough.s.01'),
 Synset('goodish.s.01'),
 Synset('hot.s.15'),
 Synset('redeeming.s.02'),
 Synset('satisfactory.s.02'),
 Synset('solid.s.01'),
 Synset('superb.s.02'),
 Synset('well-behaved.s.01')]

但是,反义词关系似乎有所不同.我设法检索连接到基本同义词集的引理,但无法检索实际的同义词集,如下所示:

wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]

如何获得通过反义关系连接到我的基本同义词集wn.synset('good.a.01')的同义词集,而不是引理? TIA

解决方案

由于某些原因,WordNet在引理级别而不是Synset上索引antonymy关系(请参阅Lemma和String具有一对一的关系:

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'

Lemma对象的_name属性返回一个unicode字符串,而不是列表.从代码点来看: https://github .com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 https://github.com上的文档字符串/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220 :

引理属性,可通过具有相同名称的方法访问::

  • 名称:此引理的规范名称.
  • 同义词集:此引理所属的同义词集.
  • syntactic_marker:对于形容词,WordNet字符串标识 句法位置相对修饰名词.看: http://wordnet.princeton.edu/man/wninput.5WN.html# sect10 对于语音的所有其他部分,此属性为无".
  • count:这个词在词网中的出现频率.

因此我们可以执行此操作,并以某种方式知道每个Lemma对象将只向我们返回1个同义词集:

>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')


假设您正在尝试进行情感分析,并且需要WordNet中每个形容词的反义词,则可以轻松地执行此操作以接受反义词的同义词集:

>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
...     return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
...     print ss, ':', get_antonyms(ss)
... 
Synset('unable.a.01') : set([Synset('unable.a.01')])

I have successfully retrieved synsets connected to a base synset via other semantic relations, as follows:

 wn.synset('good.a.01').also_sees()
 Out[63]: 
 [Synset('best.a.01'),
 Synset('better.a.01'),
 Synset('favorable.a.01'),
 Synset('good.a.03'),
 Synset('obedient.a.01'),
 Synset('respectable.a.01')]

wn.synset('good.a.01').similar_tos()
Out[64]: 
[Synset('bang-up.s.01'),
 Synset('good_enough.s.01'),
 Synset('goodish.s.01'),
 Synset('hot.s.15'),
 Synset('redeeming.s.02'),
 Synset('satisfactory.s.02'),
 Synset('solid.s.01'),
 Synset('superb.s.02'),
 Synset('well-behaved.s.01')]

However, the antonym relation seems different. I managed to retrieve the lemma connected to my base synset, but was not able to retrieve the actual synset, like so:

wn.synset('good.a.01').lemmas()[0].antonyms()
Out[67]: [Lemma('bad.a.01.bad')]

How can I get the synset, and not the lemma, that is connected via antonymy to my base synset - wn.synset('good.a.01') ? TIA

解决方案

For some reason, WordNet indexes antonymy relations at the Lemma level instead of the Synset (see http://wordnetweb.princeton.edu/perl/webwn?o2=&o0=1&o8=1&o1=1&o7=&o5=&o9=&o6=&o3=&o4=&s=good&i=8&h=00001000000000000000000000000000#c), so the question is whether Synsets and Lemmas have many-to-many or one-to-one relations.


In the case of ambiguous words, one word many meaning, we have a one-to-many relation between String-to-Synset, e.g.

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]

In the case of one meaning/concept, multiple representation, we have a one-to-many relation between Synset-to-String (where String refers to Lemma names):

>>> dog = wn.synset('dog.n.1')
>>> dog.definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> dog.lemma_names()
[u'dog', u'domestic_dog', u'Canis_familiaris']

Note: up till now, we are comparing the relationships between String and Synsets not Lemmas and Synsets.


The "cute" thing is that Lemma and String has a one-to-one relationship:

>>> wn.synsets('dog')
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
>>> wn.synsets('dog')[0]
Synset('dog.n.01')
>>> wn.synsets('dog')[0].definition()
u'a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds'
>>> wn.synsets('dog')[0].lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].name()
u'dog'

The _name property of a Lemma object returns a unicode string, not a list. From the code points: https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L202 and https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L444

And it seems like the Lemma has a one-to-one relation with Synset. From docstring at https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L220:

Lemma attributes, accessible via methods with the same name::

  • name: The canonical name of this lemma.
  • synset: The synset that this lemma belongs to.
  • syntactic_marker: For adjectives, the WordNet string identifying the syntactic position relative modified noun. See: http://wordnet.princeton.edu/man/wninput.5WN.html#sect10 For all other parts of speech, this attribute is None.
  • count: The frequency of this lemma in wordnet.

So we can do this and somehow know that each Lemma object is only going to return us 1 synset:

>>> wn.synsets('dog')[0].lemmas()[0]
Lemma('dog.n.01.dog')
>>> wn.synsets('dog')[0].lemmas()[0].synset()
Synset('dog.n.01')


Assuming that you are trying to do some sentiment analysis and you need the antonyms of every adjective in WordNet, you can easily do this to accept the Synsets of the antonyms:

>>> from nltk.corpus import wordnet as wn
>>> all_adj_in_wn = wn.all_synsets(pos='a')
>>> def get_antonyms(ss):
...     return set(chain(*[[a.synset() for a in l.antonyms()] for l in ss.lemmas()]))
...
>>> for ss in all_adj_in_wn:
...     print ss, ':', get_antonyms(ss)
... 
Synset('unable.a.01') : set([Synset('unable.a.01')])

这篇关于如何在NLTK的Wordnet中检索目标同义词集的反义词同义词集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆