创建具有NLTK同义词的数据框 [英] Create a dataframe with NLTK synonyms
本文介绍了创建具有NLTK同义词的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
早上好
我正在使用NLTK从单词框架中获取同义词.
I am using NLTK to get synonyms out of a frame of words.
print(df)
col_1 col_2
Book 5
Pen 5
Pencil 6
def get_synonyms(df, column_name):
df_1 = df["col_1"]
for i in df_1:
syn = wn.synsets(i)
for synset in list(wn.all_synsets('n'))[:2]:
print(i, "-->", synset)
print("-----------")
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
return(syn)
它确实有效,但是我想获取以下数据框,其中包含"col_1"中每个单词的前一个"n"同义词:
And it does work, but I would like to get the following dataframe, with the first "n" synonyms, of each word in "col_1":
print(df_final)
col_1 synonym
Book booklet
Book album
Pen cage
...
我尝试在synset和lemma循环之前初始化一个空列表,然后追加,但是在两种情况下都无效.例如:
I tried initializing an empty list, before both synset and lemma loop, and appending, but in both cases it didn't work; for example:
synonyms = []
for lemma in synset.lemmas():
print(lemma.name())
ci = lemma.name()
synonyms.append(ci)
推荐答案
您可以使用:
from nltk.corpus import wordnet
from itertools import chain
def get_synonyms(df, column_name, N):
L = []
for i in df[column_name]:
syn = wordnet.synsets(i)
#flatten all lists by chain, remove duplicates by set
lemmas = list(set(chain.from_iterable([w.lemma_names() for w in syn])))
for j in lemmas[:N]:
#append to final list
L.append([i, j])
#create DataFrame
return (pd.DataFrame(L, columns=['word','syn']))
#add number of filtered synonyms
df1 = get_synonyms(df, 'col_1', 3)
print (df1)
word syn
0 Book record_book
1 Book book
2 Book Word
3 Pen penitentiary
4 Pen compose
5 Pen pen
6 Pencil pencil
这篇关于创建具有NLTK同义词的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文