蟒蛇大 pandas 乘坐复数"s"用文字来准备数数 [英] python pandas get ride of plural "s" in words to prepare for word count

查看:264
本文介绍了蟒蛇大 pandas 乘坐复数"s"用文字来准备数数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下python pandas数据框:

I have the following python pandas dataframe:

Question_ID | Customer_ID | Answer
    1           234         The team worked very hard ...
    2           234         All the teams have been working together ...

我将使用我的代码对答案栏中的单词进行计数.但是,在此之前,我想从团队"一词中删除"s",因此在上面的示例中,我计算的是team:2,而不是team:1和team:1.

I am going to use my code to count words in the answer column. But beforehand, I want to take out the "s" from the word "teams", so that in the example above I count team: 2 instead of team:1 and teams:1.

如何对所有单词执行此操作?

How can I do this for all words?

推荐答案

您需要使用自然语言工具包nltk提供的标记器(用于将句子分解为单词)和词法分析器(用于使单词形式标准化). :

You need to use a tokenizer (for breaking a sentence into words) and lemmmatizer (for standardizing word forms), both provided by the natural language toolkit nltk:

import nltk
wnl = nltk.WordNetLemmatizer()
[wnl.lemmatize(word) for word in nltk.wordpunct_tokenize(sentence)]
# ['All', 'the', 'team', 'have', 'been', 'working', 'together']

这篇关于蟒蛇大 pandas 乘坐复数"s"用文字来准备数数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆