naiveBayes 和预测功能在 R 中不起作用 [英] naiveBayes and predict function not working in R

查看:42
本文介绍了naiveBayes 和预测功能在 R 中不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用下面的 R 脚本对推特评论(哈萨克语)进行情感分析.训练集有 3000 条(1500sad,1500happy)评论,测试集有 1000 条(快乐悲伤混合)评论.一切都很好,但最后,预测值都显示很高兴,这是不对的.

I am doing a sentiment analysis on twitter comments (in Kazakh language) using below R script. 3000 (1500sad, 1500happy) comments for the training set and 1000 (happy sad mixed) comments for the test set. Everything works great but at the end, the predicted values are showing all happy, which is not right.

我已经检查了每个函数并且所有函数都在运行,直到 naiveBayes 函数.我检查了分类器值,它们是正确的.我认为 naiveBayespredict 都把事情搞砸了.

I have checked every function and all are working up until the naiveBayes function. I checked classifier values and they are correct. I think either naiveBayes or predict is messing things up.

当我只使用一个快乐评论(列表中的第一个)和 1500 个悲伤(负面)评论作为训练集时,预测结果都是快乐的,我认为这主要是悲伤的.

When I used only one happy comment (first on the list) and 1500 sad(negative) comments as training set with this code, predicted results are all happy, which I think should have been sad mostly.

classifier = naiveBayes(mat[1500:3000,], as.factor(sentiment_all[1500:3000]))

然而,当我对训练集使用所有悲伤或负面评论时,预测结果都是悲伤的.

However, when I used all sad or negative comments for the training set, the predicted results are all sad.

classifier = naiveBayes(mat[1501:3000,], as.factor(sentiment_all[1501:3000]))

我花了几个小时,但我完全迷失在问题所在的地方.请帮我解决这个问题.

I spent hours and I am completely lost where the problem is. Please help me to solve this issue.

脚本如下:

setwd("Path")
happy = readLines("Path")
sad = readLines("Path")
happy_test = readLines("Path")
sad_test = readLines("Path")

tweet = c(happy, sad)
tweet_test= c(happy_test, sad_test)
tweet_all = c(tweet, tweet_test)
sentiment = c(rep("happy", length(happy) ), 
              rep("sad", length(sad)))
sentiment_test = c(rep("happy", length(happy_test) ), 
                   rep("sad", length(sad_test)))
sentiment_all = as.factor(c(sentiment, sentiment_test))

library(RTextTools)
library(e1071)

# naive bayes
mat= create_matrix(tweet_all, language="kazakh", 
                   removeStopwords=FALSE, removeNumbers=TRUE, 
                   stemWords=FALSE, tm::weightTfIdf)

mat = as.matrix(mat)

classifier = naiveBayes(mat[1:3000,], as.factor(sentiment_all[1:3000]))
predicted = predict(classifier, mat[3001:4000,]); predicted

推荐答案

您的问题非常基础,您设置的问题有误.理想情况下,您希望将 50-50 的正面和负面数据拆分为您的训练数据.由于朴素贝叶斯分类器的工作方式,它试图最小化熵.

Your issue is very basic, you are setting up your problem wrong. Ideally you want a 50-50 split of positives and negatives for your training data. Because of how the Naive Bayes classifier works, it is trying to minimize entropy.

我猜在您只有 1 条正面评论的情况下,分类器能够根据多个预测变量轻松地最小化熵.

I am guessing that in your case where you have only 1 positive comment, the classifier was able to minimize entropy very easily based on multiple predictors.

在您绝对不使用正面评论的情况下,您基本上是在说唯一的预测值/唯一可能的结果是悲伤",而这正是您的模型所做的.

Where you use absolutely no positive comments, you are basically saying that the only predicted value/ the only possible outcome is "sad" and that is exactly what your model is doing.

至于您的主要问题,请使用不同的数据集尝试不同的问题.你从哪里得到你的推文,它们是否足够多样化?

As for your main issue, try a different using a different data set. Where are you getting your tweets from, are they sufficiently diverse?

这篇关于naiveBayes 和预测功能在 R 中不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆