如何在Python中取消字词词干? [英] How to un-stem a word in Python?

查看:91
本文介绍了如何在Python中取消字词词干?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道是否我可以将它们取消设置为标准格式?

I want to know if there is anyway that I can un-stem them to a normal form?

问题是我有成千上万个不同形式的单词,例如吃,吃,吃,吃等等,我需要计算每个单词的出现频率.所有这些-吃,吃,吃,吃等都会算进去,因此,我使用了茎干.

The problem is that I have thousands of words in different forms e.g. eat, eaten, ate, eating and so on and I need to count the frequency of each word. All of these - eat, eaten, ate, eating etc will count towards eat and hence, I used stemming.

但是问题的下一部分需要我在数据中找到相似的单词,并且我正在使用nltk的同义集来计算单词之间的Wu-Palmer相似度.问题在于,nltk的同义词集不适用于词干,或者至少在此代码中它们不起作用. 检查两个单词是否相互关联

But the next part of the problem requires me to find similar words in data and I am using nltk's synsets to calculate Wu-Palmer Similarity among the words. The problem is that nltk's synsets wont work on stemmed words, or at least in this code they won't. check if two words are related to each other

我应该怎么做?有没有办法取消词干?

How should I do it? Is there a way to un-stem a word?

推荐答案

我怀疑您所说的词干实际上是紧张"的.正如您所希望的那样,每个单词的时态都应计入动词的基本形式".

I suspect what you really mean by stem is "tense". As in you want the different tense of each word to each count towards the "base form" of the verb.

签出pattern

pip install pattern

然后使用en.lemma函数返回动词的基本形式.

Then use the en.lemma function to return a verb's base form.

import pattern.en as en
base_form = en.lemma('ate') # base_form == "eat"

这篇关于如何在Python中取消字词词干?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆