必须捕获没有 return 语句的函数的输出 [英] must capture output of a function that has no return statement

查看:38
本文介绍了必须捕获没有 return 语句的函数的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 NLTK 包,它有一个函数可以告诉我给定的句子是肯定的、否定的还是中性的:

from nltk.sentiment.util import demo_liu_hu_lexicondemo_liu_hu_lexicon('今天是美好的一天')>>>积极的

问题是,该函数没有返回语句——它只是将Positive"、Negative"或Neutral"打印到标准输出.它返回的所有 - 隐式 - 是一个 NoneType 对象.(这里是函数的源代码.)

有什么方法可以捕获这个输出(除了在我的机器上弄乱 NLTK 源代码)?

解决方案

TL;DR

demo_liu_hu_lexicon 函数是关于如何使用 opinion_lexicon 的演示函数.它用于测试,不应直接使用.

<小时>

长篇

让我们看一下函数,看看我们如何重新创建一个类似的函数 https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616

def demo_liu_hu_lexicon(sentence, plot=False):"""使用 Liu 和 Hu 意见词典进行情感分类的基本示例.这个函数简单地计算正面、负面和中性词的数量在句子中,并根据哪个极性更能代表它进行分类.未出现在词典中的词被视为中性词.:param 句子:必须对极性进行分类的句子.:param plot: 如果为真,则绘制句子极性的视觉表示."""from nltk.corpus import opinion_lexicon从 nltk.tokenize 导入树库tokenizer = treebank.TreebankWordTokenizer()

好的,导入存在于函数内部是一种奇怪的用法,但这是因为它是用于简单测试或文档的演示函数.

另外,treebank.TreebankWordTokenizer()的用法比较奇怪,我们可以简单地使用nltk.word_tokenize.

让我们将导入移出并将 demo_liu_hu_lexicon 重写为 simple_sentiment 函数.

from nltk.corpus import opinion_lexicon从 nltk 导入 word_tokenizedef simple_sentiment(text):经过

接下来,我们看到

def demo_liu_hu_lexicon(sentence, plot=False):"""使用 Liu 和 Hu 意见词典进行情感分类的基本示例.这个函数简单地计算正面、负面和中性词的数量在句子中,并根据哪个极性更能代表它进行分类.未出现在词典中的词被视为中性词.:param 句子:必须对极性进行分类的句子.:param plot: 如果为真,则绘制句子极性的视觉表示."""from nltk.corpus import opinion_lexicon从 nltk.tokenize 导入树库tokenizer = treebank.TreebankWordTokenizer()pos_words = 0否定词 = 0tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]x = list(range(len(tokenized_sent))) # 绘图的 x 轴y = []

函数

  1. 首先对句子进行标记化和小写化
  2. 初始化正负词的数量.
  3. xy 被初始化用于稍后的一些绘图,所以让我们忽略它.

如果我们进一步向下函数:

def demo_liu_hu_lexicon(sentence, plot=False):from nltk.corpus import opinion_lexicon从 nltk.tokenize 导入树库tokenizer = treebank.TreebankWordTokenizer()pos_words = 0否定词 = 0tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]x = list(range(len(tokenized_sent))) # 绘图的 x 轴y = []对于 tokenized_sent 中的单词:如果opinion_lexicon.positive() 中的单词:pos_words += 1y.append(1) # 正opinion_lexicon.negative() 中的 elif 词:否定词 += 1y.append(-1) # 负数别的:y.append(0) # 中性如果 pos_words >否定词:打印('正')elif pos_words <否定词:打印('负')elif pos_words == neg_words:打印('中性')

  1. 循环简单地遍历每个标记并检查单词是否在正/负词典中.

  2. 最后,它检查否.正面和负面的词并返回标签.

现在让我们看看我们是否可以有一个更好的simple_sentiment函数,既然我们知道demo_liu_hu_lexicon做什么.

步骤 1 中的标记化无法避免,因此我们有:

from nltk.corpus import opinion_lexicon从 nltk.tokenize 导入树库def simple_sentiment(text):标记 = [word.lower() for word_tokenize(text)]

第 2-5 步有一个懒惰的方法是复制+粘贴并更改 print() -> return

from nltk.corpus import opinion_lexicon从 nltk.tokenize 导入树库def simple_sentiment(text):标记 = [word.lower() for word_tokenize(text)]对于 tokenized_sent 中的单词:如果opinion_lexicon.positive() 中的单词:pos_words += 1y.append(1) # 正opinion_lexicon.negative() 中的 elif 词:否定词 += 1y.append(-1) # 负数别的:y.append(0) # 中性如果 pos_words >否定词:返回正"elif pos_words <否定词:返回负"elif pos_words == neg_words:返回中立"

现在,您有了一个可以为所欲为的功能.

<小时>

顺便说一句,演示真的很奇怪..

当我们看到一个肯定词时加 1,当我们看到一个否定词时我们加 -1.当 pos_words > 时,我们说某事是肯定的.否定词.

这意味着整数比较列表遵循一些可能没有语言或数学逻辑的 Pythonic 序列比较 =(参见 当我们比较整数列表时会发生什么?)

I'm using the NLTK package and it has a function that tells me whether a given sentence is positive, negative, or neutral:

from nltk.sentiment.util import demo_liu_hu_lexicon

demo_liu_hu_lexicon('Today is a an awesome, happy day')
>>> Positive

Problem is, that function doesn't have a return statement - it just prints "Positive", "Negative", or "Neutral" to stdout. All it returns - implicitly - is a NoneType object. (Here's the function's source code.)

Is there any way I can capture this output (other than messing with the NLTK source code on my machine)?

解决方案

TL;DR

The demo_liu_hu_lexicon function is a demo function of how you could use the opinion_lexicon. It's used for testing and should not be used directly.


In Long

Let's look at the function and see how we can re-create a similar function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616

def demo_liu_hu_lexicon(sentence, plot=False):
    """
    Basic example of sentiment classification using Liu and Hu opinion lexicon.
    This function simply counts the number of positive, negative and neutral words
    in the sentence and classifies it depending on which polarity is more represented.
    Words that do not appear in the lexicon are considered as neutral.
    :param sentence: a sentence whose polarity has to be classified.
    :param plot: if True, plot a visual representation of the sentence polarity.
    """
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()

Okay, that's a strange use for imports to exist inside the function but this is because it's a demo function use for simple testing or documentation.

Also, the usage of treebank.TreebankWordTokenizer() is rather odd, we can simply use the nltk.word_tokenize.

Let's move the imports out and rewrite the demo_liu_hu_lexicon as a simple_sentiment function.

from nltk.corpus import opinion_lexicon
from nltk import word_tokenize

def simple_sentiment(text):
    pass

Next, we see

def demo_liu_hu_lexicon(sentence, plot=False):
    """
    Basic example of sentiment classification using Liu and Hu opinion lexicon.
    This function simply counts the number of positive, negative and neutral words
    in the sentence and classifies it depending on which polarity is more represented.
    Words that do not appear in the lexicon are considered as neutral.
    :param sentence: a sentence whose polarity has to be classified.
    :param plot: if True, plot a visual representation of the sentence polarity.
    """
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

The function

  1. first tokenized and lower-cased the sentence
  2. initialize the number of positive and negative words.
  3. x and y is initialized for some plotting later, so let's ignore that.

If we go further down the function:

def demo_liu_hu_lexicon(sentence, plot=False):
    from nltk.corpus import opinion_lexicon
    from nltk.tokenize import treebank

    tokenizer = treebank.TreebankWordTokenizer()
    pos_words = 0
    neg_words = 0
    tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]

    x = list(range(len(tokenized_sent))) # x axis for the plot
    y = []

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        print('Positive')
    elif pos_words < neg_words:
        print('Negative')
    elif pos_words == neg_words:
        print('Neutral')

  1. The loop simply go through each token and check wether the word is in the positive / negative lexicon.

  2. At the end, it checks the no. of positive and negative words and return the tag.

Now lets see whether we can have a better simple_sentiment function, now that we know what demo_liu_hu_lexicon do.

Tokenization in step 1 can't be avoided, so we have:

from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def simple_sentiment(text):
    tokens = [word.lower() for word in word_tokenize(text)]

There's an lazy way out to do step 2-5 is to just copy+paste and change the print() -> return

from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank

def simple_sentiment(text):
    tokens = [word.lower() for word in word_tokenize(text)]

    for word in tokenized_sent:
        if word in opinion_lexicon.positive():
            pos_words += 1
            y.append(1) # positive
        elif word in opinion_lexicon.negative():
            neg_words += 1
            y.append(-1) # negative
        else:
            y.append(0) # neutral

    if pos_words > neg_words:
        return 'Positive'
    elif pos_words < neg_words:
        return 'Negative'
    elif pos_words == neg_words:
        return 'Neutral'

Now, you have a function that you can do whatever you please.


BTW, the demo is really odd..

When we see a positive word add 1 and when we see a negative we add -1. And we say something is positive when pos_words > neg_words.

That means that the list of integers comparison follows some Pythonic sequence comparison that might have no linguistic or mathematical logic =(See What happens when we compare list of integers?)

这篇关于必须捕获没有 return 语句的函数的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆