是使用Logistic回归进行情感分析时获得肯定或否定程度的一种方法 [英] Is the a way of getting the degree of positiveness or negativeness when using Logistic Regression for sentiment analysis

本文介绍了是使用Logistic回归进行情感分析时获得肯定或否定程度的一种方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在跟踪有关使用Logistic回归进行情感分析的示例,其中预测结果仅给出1或0分别给出正面或负面的情绪.

I have been following an example about Sentiment Analysis using Logistic Regression, in which prediction result only gives a 1 or 0 to give positive or negative sentiment respectively.

我的挑战是我想将给定的用户输入分类为四个类别(非常好,好,平均,差)之一,但是我的预测结果每次都是1或0.

My challenge is that i want to classify a given user input into one of the four classes (very good, good, average, poor) but my prediction result every time is 1 or 0.

下面是到目前为止我的代码示例

Below is my code sample so far

from sklearn.feature_extraction.text import CountVectorizer
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_files
from sklearn.model_selection import GridSearchCV
import numpy as np
import mglearn
import matplotlib.pyplot as plt
# import warnings filter
from warnings import simplefilter
# ignore all future warnings
#simplefilter(action='ignore', category=FutureWarning)

# Get the dataset from http://ai.stanford.edu/~amaas/data/sentiment/

reviews_train = load_files("aclImdb/train/")
text_train, y_train = reviews_train.data, reviews_train.target

print("")
print("Number of documents in train data: {}".format(len(text_train)))
print("")
print("Samples per class (train): {}".format(np.bincount(y_train)))
print("")

reviews_test = load_files("aclImdb/test/")
text_test, y_test = reviews_test.data, reviews_test.target

print("Number of documents in test data: {}".format(len(text_test)))
print("")
print("Samples per class (test): {}".format(np.bincount(y_test)))
print("")


vect = CountVectorizer(stop_words="english", analyzer='word', 
                        ngram_range=(1, 1), max_df=1.0, min_df=1, 
max_features=None)
X_train = vect.fit(text_train).transform(text_train)
X_test = vect.transform(text_test)

print("Vocabulary size: {}".format(len(vect.vocabulary_)))
print("")
print("X_train:\n{}".format(repr(X_train)))
print("X_test: \n{}".format(repr(X_test)))

feature_names = vect.get_feature_names()
print("Number of features: {}".format(len(feature_names)))
print("")

param_grid = {'C': [0.001, 0.01, 0.1, 1, 10]}
grid = 
GridSearchCV(LogisticRegression(penalty='l1',dual=False,max_iter=110, 
solver='liblinear'), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best cross-validation score: {:.2f}".format(grid.best_score_))
print("Best parameters: ", grid.best_params_)
print("Best estimator: ", grid.best_estimator_)

lr = grid.best_estimator_
lr.predict(X_test)

print("Best Estimator Score: {:.2f}".format(lr.score(X_test, y_test)))
print("")

#creating an empty list for getting overall sentiment
lst = []

# number of elemetns as input
print("")
n = int(input("Enter number of rounds : ")) 

# iterating till the range 
for i in range(0, n):
    temp =[]
ele = input("\n Please Enter a sentence to get a sentiment Evaluation.  
 \n\n")
temp.append(ele)

print("")
print("Review prediction: {}". format(lr.predict(vect.transform(temp))))
print("")
lst.append(ele) # adding the element 

print(lst)
print("")
print("Overal prediction: {}". format(lr.predict(vect.transform(lst))))
print("")

我想获取一些介于-0到1之间的值,例如当您使用Vader SentimentIntensityAnalyzer的polar_scores时.

I want to get some values between -0 to 1, like when you use Vader SentimentIntensityAnalyzer's polarity_scores.

这是我要使用SentimentIntensityAnalyzer的polarity_scores实现的代码示例.

Here is a code sample of what i want to achieve using SentimentIntensityAnalyzer's polarity_scores.

# import SentimentIntensityAnalyzer class 
# from vaderSentiment.vaderSentiment module. 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 

# function to print sentiments 
# of the sentence.

def sentiment_scores(sentence): 

# Create a SentimentIntensityAnalyzer object. 
sid_obj = SentimentIntensityAnalyzer() 

# polarity_scores method of SentimentIntensityAnalyzer 
# oject gives a sentiment dictionary. 
# which contains pos, neg, neu, and compound scores.

sentiment_dict = sid_obj.polarity_scores(sentence) 

print("")
print("\n Overall sentiment dictionary is : ", sentiment_dict," \n") 
print("sentence was rated as: ", sentiment_dict['neg']*100, "% Negative 
\n") 
print("sentence was rated as: ", sentiment_dict['neu']*100, "% Neutral 
\n") 
print("sentence was rated as: ", sentiment_dict['pos']*100, "% Positive 
\n")

print("Sentence Overall Rated As: ", end = " ") 

# decide sentiment as positive, negative and neutral


if sentiment_dict['compound'] >= 0.5: 
    print("Exellent \n")
elif sentiment_dict['compound'] > 0 and sentiment_dict['compound'] <0.5:
    print("Very Good \n")
elif sentiment_dict['compound'] == 0:
    print("Good \n")
elif sentiment_dict['compound'] <= -0.5:
    print("Average \n")
elif sentiment_dict['compound'] > -0.5 and sentiment_dict['compound']<0:
    print("Poor \n")  

# Driver code 
if __name__ == "__main__" : 

while True:
       # print("")
        sentence= []
        sentence = input("\n Please enter a sentence to get a sentimet 
 evaluation. Enter exit to end progam \n")

        if sentence == "exit":

            print("\n Program End...........\n")
            print("")
            break
        else:
            sentiment_scores(sentence)

推荐答案

您有几个选择.

1:根据示例的正面或负面程度(而不是0或1)将您的初始训练数据标记为多个类别,然后执行多类别分类.

1: Label your initial training data with multiple classes according to how negative or positive the example is, instead of just 0 or 1, and perform multi-class classification.

2:由于可能不可能为1,请尝试使用predict_proba(X)predict_log_proba(X)decision_function(X)方法,并使用这些方法的结果根据一些硬编码的阈值将您的输出分为4类.我建议使用predict_proba,因为这些数字可以直接解释为概率,并且是逻辑回归与其他方法相比的主要优点之一.例如,假设第一列(不是0列)是肯定"分类

2: As 1 may not be possible, try experimenting with the predict_proba(X), predict_log_proba(X), and decision_function(X) methods and use the results from those to bin your output into the 4 classes according to some hard-coded thresholds. I would recommend using predict_proba as those numbers are directly interpretable as probabilities and is one of the main benefits of logistic regression as opposed to other methods. For example, assuming the 1st (not 0th) column is the "positive" classification

probs = lr.predict_proba(X_test)
labels = np.repeat("very_good", len(probs))
labels[probs[:, 1] <  0.75] = "good"
labels[probs[:, 1] < 0.5] = "average"
labels[probs[:, 1] < 0.25] = "poor"

这篇关于是使用Logistic回归进行情感分析时获得肯定或否定程度的一种方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆