词形还原与词干提取有什么区别? [英] What is the difference between lemmatization vs stemming?

查看:25
本文介绍了词形还原与词干提取有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我什么时候使用每个?

另外……NLTK 词形还原是否依赖于词性?如果是这样不是更准确吗?

Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was?

推荐答案

简短而密集:http://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html

词干提取和词形还原的目标都是将一个词的屈折形式和有时派生相关的形式减少到一个共同的基本形式.

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

然而,这两个词的味道不同.词干提取通常是指一种粗略的启发式过程,它切掉单词的结尾,以期在大多数情况下正确实现此目标,并且通常包括去除派生词缀.词形还原通常是指通过使用词汇和词的形态分析正确地做事,通常旨在仅去除屈折词尾并返回词的基本形式或字典形式,即引理.

However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

来自 NLTK 文档:

From the NLTK docs:

词形还原和词干提取是规范化的特例.他们确定一组相关词形的规范代表.

Lemmatization and stemming are special cases of normalization. They identify a canonical representative for a set of related word forms.

这篇关于词形还原与词干提取有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆