使用lambda函数对整个列进行定形 [英] lemmatize an entire column using lambda function

查看：69 发布时间：2021/2/15 21:15:19 python lambda nltk wordnet lemmatization

本文介绍了使用lambda函数对整个列进行定形的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经对该代码测试了一个句子，我想对其进行转换，以便可以使整列的词素化，其中每一行包含单词而没有标点符号，例如:

    import wordnet, nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd

df = pd.read_excel(r'C:\Test2\test.xlsx')
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
sentence = 'FINAL_KEYWORDS'
def get_wordnet_pos(word):
    """Map POS tag to first character lemmatize() accepts"""
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}

    return tag_dict.get(tag, wordnet.NOUN)



#Lemmatize a Sentence with the appropriate POS tag
sentence = "The striped bats are hanging on their feet for best"
print([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])

让我们假设列名称为df ['keywords']，您能帮我使用lambda函数来使整个列均化吗?

非常感谢

解决方案

在这里:

使用apply应用于列的句子
使用lambda表达式获取sentence作为输入并应用您编写的功能，类似于在print语句中使用的方式

作为词干化关键字:

# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] =  df['keywords'].apply(lambda sentence: [lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])

作为修饰词的句子( join 关键字使用''):

# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] =  df['keywords'].apply(lambda sentence: ' '.join([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)]))

I have this code tested for a sentence and I want to convert it so that I can lemmatize an entire column where each row consists in words without punctuation like: deportivas calcetin hombres deportivas shoes

    import wordnet, nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd

df = pd.read_excel(r'C:\Test2\test.xlsx')
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
sentence = 'FINAL_KEYWORDS'
def get_wordnet_pos(word):
    """Map POS tag to first character lemmatize() accepts"""
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}

    return tag_dict.get(tag, wordnet.NOUN)



#Lemmatize a Sentence with the appropriate POS tag
sentence = "The striped bats are hanging on their feet for best"
print([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])

Let's suppose Column name is df['keywords'], can you help me use a lambda function in order to lemmatize the entire column like I lemmatize the sentence above?

Many thanks in advance

解决方案

Here you go:

Use apply to apply on the column's sentences
Use lambda expression that gets a sentence as input and applies the function you wrote, in a similar to how you used in the print statement

As lemmatized keywords:

# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] =  df['keywords'].apply(lambda sentence: [lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])

As a lemmatized sentence (join keywords using ' '):

# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] =  df['keywords'].apply(lambda sentence: ' '.join([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)]))

这篇关于使用lambda函数对整个列进行定形的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用lambda函数对整个列进行定形 [英] lemmatize an entire column using lambda function

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用lambda函数对整个列进行定形 [英] lemmatize an entire column using lambda function

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭