Python NLTK不正确的情绪计算 [英] Python NLTK not sentiment calculate correct

查看:107
本文介绍了Python NLTK不正确的情绪计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我确实有一些正面和负面的句子.我想非常简单地使用Python NLTK来训练NaiveBayesClassifier来调查其他句子的情绪.

I do have some positive and negative sentence. I want very simple to use Python NLTK to train a NaiveBayesClassifier for investigate sentiment for other sentence.

我尝试使用此代码,但是我的结果始终是肯定的. http://www.sjwhitworth.com/sentiment-analysis-in- python-using-nltk/

I try to use this code, but my result is always positive. http://www.sjwhitworth.com/sentiment-analysis-in-python-using-nltk/

我是python的新手,所以复制代码时我的代码有误.

I am very new at python so there my be a mistake in the code when i copy it.

import nltk
import math
import re
import sys
import os
import codecs
reload(sys)
sys.setdefaultencoding('utf-8')

from nltk.corpus import stopwords

__location__ = os.path.realpath(
    os.path.join(os.getcwd(), os.path.dirname(__file__)))

postweet = __location__ + "/postweet.txt"
negtweet = __location__ + "/negtweet.txt"


customstopwords = ['band', 'they', 'them']

#Load positive tweets into a list
p = open(postweet, 'r')
postxt = p.readlines()

#Load negative tweets into a list
n = open(negtweet, 'r')
negtxt = n.readlines()

neglist = []
poslist = []

#Create a list of 'negatives' with the exact length of our negative tweet list.
for i in range(0,len(negtxt)):
    neglist.append('negative')

#Likewise for positive.
for i in range(0,len(postxt)):
    poslist.append('positive')

#Creates a list of tuples, with sentiment tagged.
postagged = zip(postxt, poslist)
negtagged = zip(negtxt, neglist)

#Combines all of the tagged tweets to one large list.
taggedtweets = postagged + negtagged

tweets = []

#Create a list of words in the tweet, within a tuple.
for (word, sentiment) in taggedtweets:
    word_filter = [i.lower() for i in word.split()]
    tweets.append((word_filter, sentiment))

#Pull out all of the words in a list of tagged tweets, formatted in tuples.
def getwords(tweets):
    allwords = []
    for (words, sentiment) in tweets:
        allwords.extend(words)
    return allwords

#Order a list of tweets by their frequency.
def getwordfeatures(listoftweets):
    #Print out wordfreq if you want to have a look at the individual counts of words.
    wordfreq = nltk.FreqDist(listoftweets)
    words = wordfreq.keys()
    return words

#Calls above functions - gives us list of the words in the tweets, ordered by freq.
print getwordfeatures(getwords(tweets))

wordlist = [] 
wordlist = [i for i in wordlist if not i in stopwords.words('english')]
wordlist = [i for i in wordlist if not i in customstopwords]

def feature_extractor(doc):
    docwords = set(doc)
    features = {}
    for i in wordlist:
        features['contains(%s)' % i] = (i in docwords)
    return features

#Creates a training set - classifier learns distribution of true/falses in the input.
training_set = nltk.classify.apply_features(feature_extractor, tweets)
classifier = nltk.NaiveBayesClassifier.train(training_set)

print classifier.show_most_informative_features(n=30)

while True:
    input = raw_input('ads')
    if input == 'exit':
        break
    elif input == 'informfeatures':
        print classifier.show_most_informative_features(n=30)
        continue
    else:
        input = input.lower()
        input = input.split()
        print '\nWe think that the sentiment was ' + classifier.classify(feature_extractor(input)) + ' in that sentence.\n'

p.close()
n.close()

这仅仅是代码错误吗?还是问题所在. 问题开始时,它应该打印出打印classifier.show_most_informative_features(n=30),但是我得到的结果是大多数信息功能 没有

Are this just a code-error? Or what is the problem. When the problem start it should it should print out print classifier.show_most_informative_features(n=30) but the result i get is Most Informative Features None

这是否可以提供提示.

谢谢

推荐答案

wordList为空.应该将其分配给getwordfeatures(getwords(tweets)).

wordList is empty. It should be assigned to getwordfeatures(getwords(tweets)).

以下两行:

wordlist = [i表示单词列表中的i,如果不是i,则表示stopwords.words('english')]

wordlist = [i for i in wordlist if not i in stopwords.words('english')]

wordlist = [i表示单词列表中的i,如果我不是customtopwords中的i]

wordlist = [i for i in wordlist if not i in customstopwords]

是非此即彼";您可以尝试使用哪个停用词列表更好.

are an "either-or"; You can try which stopword list works better.

这篇关于Python NLTK不正确的情绪计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆