如何分配不同的得分在研发情感分析? [英] How to assign different scores for sentiment analysis in R?
问题描述
我有鸣叫的文件,我想/需要进行情感分析。 我所遇到的这个的过程中,效果很好,但现在我要改变这种code,这样我可以分配基于情绪不同的分数。
I have a file of Tweets which I want/need to perform sentiment analysis on. I have come across this process, which works well however now I want to alter this code, so that I can assign different scores based on sentiment.
这是在code:
score.sentiment = function(sentences , pos.words, neg.words , progress='none')
{
require(plyr)
require(stringr)
scores = laply(sentences,function(sentence,pos.words,neg.words)
{
sentence =gsub('[[:punct:]]','',sentence)
sentence =gsub('[[:cntrl]]','',sentence)
sentence =gsub('\\d+','',sentence)
sentence=tolower(sentence)
word.list=str_split(sentence,'\\s+')
words=unlist(word.list)
pos.matches=match(words,pos.words)
neg.matches=match(words,neg.words)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
score=sum(pos.matches)-sum(neg.matches)
return(score)
},pos.words,neg.words,.progress=.progress)
scores.df=data.frame(scores=scores,text=sentences)
return(scores.df)
}
什么我现在希望做的,是有4本词典;
What I am now looking to do, is to have FOUR dictionaries;
super.words,POS机,也就是说,neg.words,terrible.words。
super.words, pos,words, neg.words, terrible.words.
我要为这些字典分配不同的分数: super.words = + 2,pos.words = + 1,neg.words = -1,terrible.words = -2。
I want to assign different scores for each of these dictionaries : super.words =+2, pos.words=+1, neg.words=-1, terrible.words=-2.
我知道 pos.matches =!is.na(pos.matches)
和 neg.matches =!is.na(负。匹配)
为TRUE / FALSE,但是我想找出如何分配这些具体比分它给出一个评分为每个鸣叫分配1/0。
I know that pos.matches = !is.na(pos.matches)
and neg.matches = !is.na(neg.matches)
assigns 1/0 for TRUE/FALSE, however I want to find out how to assign these specific scores which gives a score for EACH tweet.
目前,我只是专注于标准的两本词典,POS和NEG。 我已经指定分数这两个数据帧:
At the moment, I am just focusing on the standard two dictionaries, pos and neg. I have assigned scores to these two data frames:
posDF<-data.frame(words=pos, value=1, stringsAsFactors=F)
negDF<-data.frame(words=neg, value=-1, stringsAsFactors=F)
,并试图与这些却没有任何工程。
and tried to run the above algorithm with these however nothing works.
我碰到<一href="http://stackoverflow.com/questions/28072370/how-to-extract-individual-words-from-sentence-and-match-them-with-words-from-pos?rq=1">this页面,,其中一个已经为写了几本页面然而循环结束结果仅提供任-1,0或1的整体得分
I came across this page and this page where one has written several 'for' loops however the end result only provides an overall score of either -1,0 or 1.
最后,我要寻找一个类似的结果:
Ultimately, I am looking for a result similar to this:
table(analysis$score)
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 19
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 19
3 8 49 164 603 2790 ..................等
3 8 49 164 603 2790 ..................etc
不过,到目前为止,如果我得到的结果,不涉及不必调试的code,我得到这样的:
however so far , if I get a result that doesn't involve having to "debug" the code, I get this:
< table of extent 0 >
下面是我使用的一些示例鸣叫:
Here are some sample Tweets I am using:
tweets<-data.frame(words=c("@UKLabour @KarlTurnerMP #LabourManifesto Speaking as a carer, labours NHS plans are all good news, very happy. Making my day this!", "#LabourManifesto eggs and sweet things are looking evil", "@UKLabour @KarlTurnerMP Half way through the #LabourManifesto, this will definitely improve every-bodies lives if implemented fully.", "There is nothing "long term" about fossil fuels. #fracking #labourmanifesto https://twitter.com/stevetopple/status/587576796599595012", "Fair play Ed, very strong speech! Finally had the chance to watch it. #LabourManifesto wanna see the other manifestos nowwww") )
任何帮助是极大AP preciated!
Any help is greatly appreciated!
因此,从本质上讲,我想知道如果有一种方法来改变这一部分的原始脚本的:
So, essentially, I am wondering if there is a way to change this section of the original script:
pos.matches=match(words,pos.words)
neg.matches=match(words,neg.words)
pos.matches = !is.na(pos.matches)
neg.matches = !is.na(neg.matches)
这样我可以指定自己的具体比分? (pos.words = + 1,neg.words = -1)?或者,如果我必须将各种if和for循环?
so I can assign my own specific scores? (pos.words=+1, neg.words=-1) ? Or if I would have to incorporate various if and for loops?
推荐答案
如果您只是希望使用自定义的分数产生的总得分,你可以只改变这一行分数= SUM(POS。比赛)-sum(neg.matches)
来是这样的:
If you are just looking to use custom scores in generating the total score, you could just change this line score=sum(pos.matches)-sum(neg.matches)
to be something like:
score=sum((super.pos.matches)*2 + sum(pos.matches) + sum(neg.matches)*(-1) + sum(terrible.matches)*(-2))
这篇关于如何分配不同的得分在研发情感分析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!