使用python从NLTK中提取名词短语 [英] Extracting noun phrases from NLTK using python

查看：979 发布时间：2020/5/18 1:14:50 python nltk

本文介绍了使用python从NLTK中提取名词短语的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是python和nltk的新手.我已将代码从 https://gist.github.com/alexbowe/879414 转换为下面给出的代码使其可以运行于许多文档/文本块.但是我遇到了以下错误

I am new to both python and nltk. I have converted the code from https://gist.github.com/alexbowe/879414 to the below given code to make it run for many documents/text chunks. But I got the following error

 Traceback (most recent call last):
 File "E:/NLP/PythonProgrames/NPExtractor/AdvanceMain.py", line 16, in    <module>
  result = np_extractor.extract()
 File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 67,   in extract
 for term in terms:
File "E:\NLP\PythonProgrames\NPExtractor\NPExtractorAdvanced.py", line 60, in get_terms
 for leaf in self.leaves(tree):
 TypeError: leaves() takes 1 positional argument but 2 were given

任何人都可以帮助我解决此问题.我必须从数以百万计的产品评论中提取名词短语.我使用Java使用Standford NLP套件，但速度非常慢，因此我认为在python中使用nltk会更好.如果有更好的解决方案，也请提出建议.

Can any one help me to fix this problem. I have to extract noun phrases from millions of product reviews. I used Standford NLP kit using Java, but it was extremely slow, so I thought using nltk in python will be better. Please also recommend if there is any better solution.

import nltk
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
grammar = r"""
 NBAR:
    {<NN.*|JJ>*<NN.*>}  # Nouns and Adjectives, terminated with Nouns
 NP:
    {<NBAR>}
    {<NBAR><IN><NBAR>}  # Above, connected with in/of/etc...
"""
   lemmatizer = nltk.WordNetLemmatizer()
   stemmer = nltk.stem.porter.PorterStemmer()

class NounPhraseExtractor(object):

    def __init__(self, sentence):
        self.sentence = sentence

    def execute(self):
        # Taken from Su Nam Kim Paper...
        chunker = nltk.RegexpParser(grammar)
        #toks = nltk.regexp_tokenize(text, sentence_re)
        # #postoks = nltk.tag.pos_tag(toks)
        toks = nltk.word_tokenize(self.sentence)
        postoks = nltk.tag.pos_tag(toks)
        tree = chunker.parse(postoks)
        return tree

    def leaves(tree):
        """Finds NP (nounphrase) leaf nodes of a chunk tree."""
        for subtree in tree.subtrees(filter=lambda t: t.label() == 'NP'):
            yield subtree.leaves()

    def normalise(word):
        """Normalises words to lowercase and stems and lemmatizes it."""
        word = word.lower()
        word = stemmer.stem_word(word)
        word = lemmatizer.lemmatize(word)
        return word

    def acceptable_word(word):
        """Checks conditions for acceptable word: length, stopword."""
        accepted = bool(2 <= len(word) <= 40
                    and word.lower() not in stopwords)
        return accepted

    def get_terms(self,tree):
        for leaf in self.leaves(tree):
            term = [self.normalise(w) for w, t in leaf if self.acceptable_word(w)]
        yield term

    def extract(self):
        terms = self.get_terms(self.execute())
        matches = []
        for term in terms:
            for word in term:
                matches.append(word)
        return matches

使用python从NLTK中提取名词短语 [英] Extracting noun phrases from NLTK using python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用python从NLTK中提取名词短语 [英] Extracting noun phrases from NLTK using python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭