情绪分析的最佳算法方法 [英] Best Algorithmic Approach to Sentiment Analysis

查看:39
本文介绍了情绪分析的最佳算法方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的要求是接收新闻文章并确定它们对某个主题是正面还是负面.我正在采用下面概述的方法,但我一直在阅读 NLP 可能在这里有用.我所读到的所有内容都指向 NLP 从事实中检测意见,我认为这对我来说并不重要.我想知道两件事:

My requirement is taking in news articles and determining if they are positive or negative about a subject. I am taking the approach outlined below, but I keep reading NLP may be of use here. All that I have read has pointed at NLP detecting opinion from fact, which I don't think would matter much in my case. I'm wondering two things:

1) 为什么我的算法不起作用和/或我该如何改进它?(我知道讽刺可能是一个陷阱,但我再次认为在我们将获得的新闻类型中不会发生太多)

1) Why wouldn't my algorithm work and/or how can I improve it? ( I know sarcasm would probably be a pitfall, but again I don't see that occurring much in the type of news we will be getting)

2) NLP 有什么帮助,我为什么要使用它?

2) How would NLP help, why should I use it?

我的算法方法(我有肯定词、否定词和否定词的词典):

My algorithmic approach (I have dictionaries of positive, negative, and negation words):

1) 统计文章中正负词的个数

1) Count number of positive and negative words in article

2) 如果发现一个否定词有 2 或 3 个肯定词或否定词,(即:不是最好的)否定分数.

2) If a negation word is found with 2 or 3 words of the positive or negative word, (ie: NOT the best) negate the score.

3) 将分数乘以手动分配给每个单词的权重.(1.0 开始)

3) Multiply the scores by weights that have been manually assigned to each word. (1.0 to start)

4) 将正面和负面的总数相加得到情绪分数.

4) Add up the totals for positive and negative to get the sentiment score.

推荐答案

我不认为你的算法有什么特别错误,这是一个相当直接和实用的路要走,但有很多情况会出错.

I don't think there's anything particularly wrong with your algorithm, it's a fairly straightforward and practical way to go, but there are a lot of situations where it will get make mistakes.

  1. 模棱两可的情感词 - 这个产品非常好用"与这个产品非常好"

  1. Ambiguous sentiment words - "This product works terribly" vs. "This product is terribly good"

遗漏的否定 - 数百万年后,我永远不会说这个产品值得购买"

Missed negations - "I would never in a millions years say that this product is worth buying"

引用/间接文本 - 我爸爸说这个产品很糟糕,但我不同意"

Quoted/Indirect text - "My dad says this product is terrible, but I disagree"

比较 - 这个产品的用处和头上的一个洞差不多"

Comparisons - "This product is about as useful as a hole in the head"

任何微妙的东西 - 这个产品丑陋、缓慢且乏味,但它是市场上唯一能胜任这项工作的东西"

Anything subtle - "This product is ugly, slow and uninspiring, but it's the only thing on the market that does the job"

我使用产品评论作为示例,而不是新闻报道,但您懂的.事实上,新闻文章可能更难,因为它们经常试图展示论点的双方,并倾向于使用某种风格来传达一个观点.例如,最后一个例子在评论文章中很常见.

I'm using product reviews for examples instead of news stories, but you get the idea. In fact, news articles are probably harder because they will often try to show both sides of an argument and tend to use a certain style to convey a point. The final example is quite common in opinion pieces, for example.

至于 NLP 可以帮助您解决这些问题,词义消歧(甚至只是部分语音标记) 可能有助于 (1)、句法解析 可能有助于解决 (2) 中的长距离依赖,某种分块 可能对 (3) 有所帮助.不过,这都是研究级别的工作,我不知道您可以直接使用什么.问题 (4) 和 (5) 更难,我举手放弃.

As far as NLP helping you with any of this, word sense disambiguation (or even just part-of-speech tagging) may help with (1), syntactic parsing might help with the long range dependencies in (2), some kind of chunking might help with (3). It's all research level work though, there's nothing that I know of that you can directly use. Issues (4) and (5) are a lot harder, I throw up my hands and give up at this point.

我会坚持你的方法并仔细查看输出,看看它是否在做你想要的.当然,这会引发一个问题,即您希望您首先理解情绪"的定义......

I'd stick with the approach you have and look at the output carefully to see if it is doing what you want. Of course that then raises the issue of what you want you understand the definition of "sentiment" to be in the first place...

这篇关于情绪分析的最佳算法方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆