给定引理和标记,如何逆引理化过程? [英] How to inverse lemmatization process given a lemma and a token?

查看:109
本文介绍了给定引理和标记,如何逆引理化过程?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一般来说,在自然语言处理中,我们想要得到一个token的lemma.

Generally, in natural language processing, we want to get the lemma of a token.

例如,我们可以使用 wordnet 词形还原将eaten"映射到eat".

For example, we can map 'eaten' to 'eat' using wordnet lemmatization.

python中是否有任何工具可以将引理反演为某种形式?

例如,我们将 'go' 映射到 'gone' 给定目标形式 'eaten'.

For example, we map 'go' to 'gone' given target form 'eaten'.

PS:有人提到我们必须存储这样的映射.如何在 Python 中取消词干?>

PS: Someone mentions we have to store such mappings. How to un-stem a word in Python?

推荐答案

将诸如引理之类的基本形式转换为适合情况的形式称为 实现(或表面实现").来自维基百科的例子:

Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization"). Example from Wikipedia:

NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
// output: "The women do not smoke."

用于此的库不像词形还原器那样经常使用,这通常意味着您的选择较少,并且不太可能找到开发良好的库.维基百科的例子是用 Java 编写的,因为支持它的最流行的库是 SimpleNLG.

Libraries for this are not as frequently used as lemmatizers, which generally means you have fewer options and are less likely to find a well developed library. The Wikipedia example is in Java because the most popular library supporting this is SimpleNLG.

快速搜索发现 pynlg,尽管它似乎没有得到积极维护.或者,您可以通过 Python 库 nlgserv 提供的 HTTP JSON 接口使用 SimpleNLG.

A quick search found pynlg, though it doesn't seem actively maintained. Alternately you can use SimpleNLG via an HTTP JSON interface provided by the Python library nlgserv.

这篇关于给定引理和标记,如何逆引理化过程?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆