如何在 Python NLTK 中计算 Vader“复合"极性分数? [英] How is the Vader 'compound' polarity score calculated in Python NLTK?

查看：28 发布时间：2022/1/2 17:41:24 python nlp nltk sentiment-analysis vader

本文介绍了如何在 Python NLTK 中计算 Vader“复合"极性分数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Vader SentimentAnalyzer 来获取极性分数.我之前使用了正/负/中性的概率分数，但我刚刚意识到复合"分数，范围从 -1(最负)到 1(最正)将提供单一的极性度量.我想知道复合"分数是如何计算的.这是根据 [pos, neu, neg] 向量计算的吗?

解决方案

VADER 算法将情感分数输出到 4 类情感

alpha=15:

alpha=50000:

alpha=0.001:

当它是负数时它会变得时髦:

alpha=-10:

alpha=-1,000,000:

alpha=-1,000,000,000:

I'm using the Vader SentimentAnalyzer to obtain the polarity scores. I used the probability scores for positive/negative/neutral before, but I just realized the "compound" score, ranging from -1 (most neg) to 1 (most pos) would provide a single measure of polarity. I wonder how the "compound" score computed. Is that calculated from the [pos, neu, neg] vector?

解决方案

The VADER algorithm outputs sentiment scores to 4 classes of sentiments https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L441:

neg: Negative
neu: Neutral
pos: Positive
compound: Compound (i.e. aggregated score)

Let's walk through the code, the first instance of compound is at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L421, where it computes:

compound = normalize(sum_s)

The normalize() function is defined as such at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L107:

def normalize(score, alpha=15):
    """
    Normalize the score to be between -1 and 1 using an alpha that
    approximates the max expected value
    """
    norm_score = score/math.sqrt((score*score) + alpha)
    return norm_score

So there's a hyper-parameter alpha.

As for the sum_s, it is a sum of the sentiment arguments passed to the score_valence() function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L413

And if we trace back this sentiment argument, we see that it's computed when calling the polarity_scores() function at https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L217:

def polarity_scores(self, text):
    """
    Return a float for sentiment strength based on the input text.
    Positive values are positive valence, negative value are negative
    valence.
    """
    sentitext = SentiText(text)
    #text, words_and_emoticons, is_cap_diff = self.preprocess(text)

    sentiments = []
    words_and_emoticons = sentitext.words_and_emoticons
    for item in words_and_emoticons:
        valence = 0
        i = words_and_emoticons.index(item)
        if (i < len(words_and_emoticons) - 1 and item.lower() == "kind" and 
            words_and_emoticons[i+1].lower() == "of") or 
            item.lower() in BOOSTER_DICT:
            sentiments.append(valence)
            continue

        sentiments = self.sentiment_valence(valence, sentitext, item, i, sentiments)

    sentiments = self._but_check(words_and_emoticons, sentiments)

Looking at the polarity_scores function, what it's doing is to iterate through the whole SentiText lexicon and checks with the rule-based sentiment_valence() function to assign the valence score to the sentiment https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L243, see Section 2.1.1 of http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf

So going back to the compound score, we see that:

the compound score is a normalized score of sum_s and
sum_s is the sum of valence computed based on some heuristics and a sentiment lexicon (aka. Sentiment Intensity) and
the normalized score is simply the sum_s divided by its square plus an alpha parameter that increases the denominator of the normalization function.

Is that calculated from the [pos, neu, neg] vector?

Not really =)

If we take a look at the score_valence function https://github.com/nltk/nltk/blob/develop/nltk/sentiment/vader.py#L411, we see that the compound score is computed with the sum_s before the pos, neg and neu scores are computed using _sift_sentiment_scores() that computes the invidiual pos, neg and neu scores using the raw scores from sentiment_valence() without the sum.

If we take a look at this alpha mathemagic, it seems the output of the normalization is rather unstable (if left unconstrained), depending on the value of alpha:

alpha=0:

alpha=15:

alpha=50000:

alpha=0.001:

It gets funky when it's negative:

alpha=-10:

alpha=-1,000,000:

alpha=-1,000,000,000:

这篇关于如何在 Python NLTK 中计算 Vader“复合"极性分数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在 Python NLTK 中计算 Vader“复合"极性分数? [英] How is the Vader 'compound' polarity score calculated in Python NLTK?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在 Python NLTK 中计算 Vader“复合"极性分数? [英] How is the Vader &#39;compound&#39; polarity score calculated in Python NLTK?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何在 Python NLTK 中计算 Vader“复合"极性分数? [英] How is the Vader 'compound' polarity score calculated in Python NLTK?

登录关闭