NLTK每个单词的最常用同义词(Wordnet) [英] NLTK Most common synonym (Wordnet) for each word

查看：644 发布时间：2020/5/18 1:17:24 python python-2.7 python-3.x nltk

本文介绍了NLTK每个单词的最常用同义词(Wordnet)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否有找到NLTK单词最常见的同义词的方法?我想使用每个单词的最常见同义词来简化句子.

Is there a way to find the most common synonym of a word with NLTK? I would like to simplify a sentence using the most common synonyms of each word on it.

如果句子中使用的单词已经是同义词组中最常见的单词，则不应更改.

If a word used in the sentence is already the most common word from its group of synonyms, it shouldn't be changed.

让我们说嗨"比你好"更常见；亲爱的"比有价值的"更常见；朋友"已经是其组os同义词中最常见的词.

Let's say "Hi" is more common than "Hello"; "Dear" is more common than "Valued"; and "Friend" is already the most common word of its group os synonyms.

Input: "Hello my valued friend"
Return: "Hi my dear friend"

推荐答案

同义词很棘手，但是如果您是从Wordnet的同义词集开始的，而您只想选择集合中最常见的成员，那就非常简单了:只需从语料库构建您自己的频率列表，然后查找同义词集的每个成员以选择最大值即可.

Synonyms are tricky, but if you are starting out with a synset from Wordnet and you simply want to choose the most common member in the set, it's pretty straightforward: Just build your own frequency list from a corpus, and look up each member of the synset to pick the maximum.

使用nltk，您只需几行代码即可构建频率表.这是一个基于布朗语料库的

The nltk will let you build a frequency table in just a few lines of code. Here's one based on the Brown corpus:

from nltk.corpus import brown
freqs = nltk.FreqDist(w.lower() for w in brown.words())

然后您可以查询这样的单词的频率:

You can then look up the frequency of a word like this:

>>> print(freqs["valued"]) 
14

当然，您需要做更多的工作:我将对语音的每个主要部分分别计算单词(wordnet提供n，v，a和r，分别是.noun，verb，adjective和adverb)，然后使用这些POS特定的频率(在调整了不同的标签集表示法之后)选择正确的替代词.

Of course you'll need to do a little more work: I would count words separately for each of the major parts of speech (wordnet provides n, v, a, and r, resp. noun, verb, adjective and adverb), and use these POS-specific frequencies (after adjusting for the different tagset notations) to choose the right substitute.

>>> freq2 = nltk.ConditionalFreqDist((tag, wrd.lower()) for wrd, tag in 
        brown.tagged_words(tagset="universal"))

>>> print(freq2["ADJ"]["valued"])
0
>>> print(freq2["ADJ"]["dear"])
45

这篇关于NLTK每个单词的最常用同义词(Wordnet)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

NLTK每个单词的最常用同义词(Wordnet) [英] NLTK Most common synonym (Wordnet) for each word

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

NLTK每个单词的最常用同义词(Wordnet) [英] NLTK Most common synonym (Wordnet) for each word

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭