文本分析-无法在csv或xls文件中写入Python程序的输出 [英] Text analysis-Unable to write output of Python program in csv or xls file

查看:131
本文介绍了文本分析-无法在csv或xls文件中写入Python程序的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python 2.x中的朴素贝叶斯分类器进行情感分析.它使用txt文件读取情感,然后根据示例txt文件情感给出正或负输出. 我想要输出与输入相同的形式,例如我有一个让let坐着1000个原始情感的文本文件,并且我希望输出对每个情感都显示正面还是负面. 请帮忙. 下面是我正在使用的代码

Hi I am trying to do a sentiment analysis using Naive Bayes classifier in python 2.x. It reads the sentiment using a txt file and then gives output as positive or negative based on the sample txt file sentiments. I want the output the same form as input e.g. I have a text file of lets sat 1000 raw sentiments and I want the output to show positive or negative against each sentiment. Please help. Below is the code i am using

import math
import string

def Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string):
    y_values = [0,1]
    prob_values = [None, None]

    for y_value in y_values:
        posterior_prob = 1.0

        for word in test_string.split():
            word = word.lower().translate(None,string.punctuation).strip()
            if y_value == 0:
                if word not in negative:
                    posterior_prob *= 0.0
                else:
                    posterior_prob *= negative[word]
            else:
                if word not in positive:
                    posterior_prob *= 0.0
                else:
                    posterior_prob *= positive[word]

        if y_value == 0:
            prob_values[y_value] = posterior_prob * float(total_negative) / (total_negative + total_positive)
        else:
            prob_values[y_value] = posterior_prob * float(total_positive) / (total_negative + total_positive)

    total_prob_values = 0
    for i in prob_values:
        total_prob_values += i

    for i in range(0,len(prob_values)):
        prob_values[i] = float(prob_values[i]) / total_prob_values

    print prob_values

    if prob_values[0] > prob_values[1]:
        return 0
    else:
        return 1


if __name__ == '__main__':
    sentiment = open(r'C:/Users/documents/sample.txt')

    #Preprocessing of training set
    vocabulary = {}
    positive = {}
    negative = {}
    training_set = []
    TOTAL_WORDS = 0
    total_negative = 0
    total_positive = 0

    for line in sentiment:
        words = line.split()
        y = words[-1].strip()
        y = int(y)

        if y == 0:
            total_negative += 1
        else:
            total_positive += 1

        for word in words:
            word = word.lower().translate(None,string.punctuation).strip()
            if word not in vocabulary and word.isdigit() is False:
                vocabulary[word] = 1
                TOTAL_WORDS += 1
            elif word in vocabulary:
                vocabulary[word] += 1
                TOTAL_WORDS += 1

            #Training
            if y == 0:
                if word not in negative:
                    negative[word] = 1
                else:
                    negative[word] += 1
            else:
                if word not in positive:
                    positive[word] = 1
                else:
                    positive[word] += 1

    for word in vocabulary.keys():
        vocabulary[word] = float(vocabulary[word])/TOTAL_WORDS

    for word in positive.keys():
        positive[word] = float(positive[word])/total_positive

    for word in negative.keys():
        negative[word] = float(negative[word])/total_negative

    test_string = raw_input("Enter the review: \n")

    classifier = Naive_Bayes_Classifier(positive, negative, total_negative, total_positive, test_string)
    if classifier == 0:
        print "Negative review"
    else:
        print "Positive review"

推荐答案

我已经检查了您在评论中发布的github存储库.我尝试运行该项目,但出现一些错误.

I've checked the github repo posted by you in comments. I tried to run the project, but I have some errors.

无论如何,我已经检查了项目结构和用于训练朴素贝叶斯算法的文件,并且我认为以下代码段可以用于将结果数据写入Excel文件(即.xls)

Anyway, I've checked the project structure and the file used to training the naive bayes algorithm, and I think that the following piece of code can be used to write your result data in a Excel file (i.e. .xls)

with open("test11.txt") as f:
    for line in f:
        classifier = naive_bayes_classifier(positive, negative, total_negative, total_positive, line)
        result = 'Positive' if classifier == 0 else 'Negative'
        data_to_be_written += ([line, result],)

# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('test.xls')
worksheet = workbook.add_worksheet()

# Start from the first cell. Rows and columns are zero indexed.
row = 0
col = 0

# Iterate over the data and write it out row by row.
for item, cost in data_to_be_written:
   worksheet.write(row, col,     item)
worksheet.write(row, col + 1, cost)
row += 1

workbook.close()

或者,对于包含要测试的句子的文件的每一行,我调用分类器并准备一个将要写入csv文件的结构.
然后循环结构并写入xls文件.
为此,我使用了一个名为xlsxwriter的python网站包.

Sorthly, for each row of the file with the sentences to be tested, I call the classifier and prepare a structure that will be written in the csv file.
Then loop the structure and write the xls file.
To do this I have used a python site package called xlsxwriter.

正如我之前告诉您的那样,我在运行项目时遇到了一些问题,因此该代码也未经过测试.不管怎样,它应该能很好地工作,如果您遇到麻烦,请告诉我.

As I told you before, I have some problem to run the project, so this code is not tested as well. It should be works well, bu anyway, if you are in trouble, let me know.

致谢

这篇关于文本分析-无法在csv或xls文件中写入Python程序的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆