如何对句子列表进行词素化 [英] How to lemmatize a list of sentences

查看：98 发布时间：2020/5/2 7:27:10 python list nltk lemmatization

本文介绍了如何对句子列表进行词素化的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何在Python中对句子列表进行定形化?

How can I lemmatize a list of sentences in Python?

from nltk.stem.wordnet import WordNetLemmatizer
a = ['i like cars', 'cats are the best']
lmtzr = WordNetLemmatizer()
lemmatized = [lmtzr.lemmatize(word) for word in a]
print(lemmatized)

这是我尝试过的方法，但它给了我相同的句子.要正常工作，我需要标记单词吗?

This is what I've tried but it gives me the same sentences. Do I need to tokenize the words before to work properly?

推荐答案

TL; DR :

pip3 install -U pywsd

然后:

>>> from pywsd.utils import lemmatize_sentence

>>> text = 'i like cars'
>>> lemmatize_sentence(text)
['i', 'like', 'car']
>>> lemmatize_sentence(text, keepWordPOS=True)
(['i', 'like', 'cars'], ['i', 'like', 'car'], ['n', 'v', 'n'])

>>> text = 'The cat likes cars'
>>> lemmatize_sentence(text, keepWordPOS=True)
(['The', 'cat', 'likes', 'cars'], ['the', 'cat', 'like', 'car'], [None, 'n', 'v', 'n'])

>>> text = 'The lazy brown fox jumps, and the cat likes cars.'
>>> lemmatize_sentence(text)
['the', 'lazy', 'brown', 'fox', 'jump', ',', 'and', 'the', 'cat', 'like', 'car', '.']

否则，请看一下pywsd中的功能:

Otherwise, take a look at how the function in pywsd:

标记字符串
使用POS标记器并映射到WordNet POS标记集
企图阻止
最后用POS和/或词干调用lemmatizer

请参见 https://github.com/alvations/pywsd/blob/master/pywsd/utils.py#L129

这篇关于如何对句子列表进行词素化的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何对句子列表进行词素化 [英] How to lemmatize a list of sentences

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何对句子列表进行词素化 [英] How to lemmatize a list of sentences

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭