词形化与词干的区别是什么? [英] What is the difference between lemmatization vs stemming?

查看:163
本文介绍了词形化与词干的区别是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

何时使用每个?

... NLTK词素化是否取决于词性? 是不是会更准确?

Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was?

推荐答案

又短又密集:

词干和词根化的目标都是将单词的屈折形式和有时与派生相关的形式减少为通用的基本形式.

The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.

但是,这两个词的风格不同.词干通常是指粗略的启发式过程,它会砍掉单词的结尾,以期在大多数时间正确实现此目标,并且通常包括删除派生词缀.合法化通常是指使用单词的词汇和词法分析来正确处理事情,通常旨在仅去除词尾变化并返回单词的基数或字典形式,即引理.

However, the two words differ in their flavor. Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of achieving this goal correctly most of the time, and often includes the removal of derivational affixes. Lemmatization usually refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma .

来自NLTK文档:

合法化和词干提取是规范化的特殊情况.他们为一组相关的单词形式确定了规范的代表.

Lemmatization and stemming are special cases of normalization. They identify a canonical representative for a set of related word forms.

这篇关于词形化与词干的区别是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆