蟒蛇大 pandas 乘坐复数"s"用文字来准备数数 [英] python pandas get ride of plural "s" in words to prepare for word count
本文介绍了蟒蛇大 pandas 乘坐复数"s"用文字来准备数数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下python pandas数据框:
I have the following python pandas dataframe:
Question_ID | Customer_ID | Answer
1 234 The team worked very hard ...
2 234 All the teams have been working together ...
我将使用我的代码对答案栏中的单词进行计数.但是,在此之前,我想从团队"一词中删除"s",因此在上面的示例中,我计算的是team:2,而不是team:1和team:1.
I am going to use my code to count words in the answer column. But beforehand, I want to take out the "s" from the word "teams", so that in the example above I count team: 2 instead of team:1 and teams:1.
如何对所有单词执行此操作?
How can I do this for all words?
推荐答案
您需要使用自然语言工具包nltk
提供的标记器(用于将句子分解为单词)和词法分析器(用于使单词形式标准化). :
You need to use a tokenizer (for breaking a sentence into words) and lemmmatizer (for standardizing word forms), both provided by the natural language toolkit nltk
:
import nltk
wnl = nltk.WordNetLemmatizer()
[wnl.lemmatize(word) for word in nltk.wordpunct_tokenize(sentence)]
# ['All', 'the', 'team', 'have', 'been', 'working', 'together']
这篇关于蟒蛇大 pandas 乘坐复数"s"用文字来准备数数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文