使用NLTK python使用样本数据或网络服务对句子进行情感分析? [英] Sentiment analysis with NLTK python for sentences using sample data or webservice?

查看:100
本文介绍了使用NLTK python使用样本数据或网络服务对句子进行情感分析?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在着手进行NLP项目以进行情感分析.

I am embarking upon a NLP project for sentiment analysis.

我已经为python成功安装了NLTK(似乎很不错的软件).但是,我在理解如何使用它来完成任务时遇到了麻烦.

I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task.

这是我的任务:

  1. 我从一长串数据开始(比如说,通过其网络服务发布有关英国大选主题的数百条推文)
  2. 我想将其分解为句子(或信息长度不超过100个字符)(我想我可以在python中做到这一点?)
  3. 然后在所有句子中搜索该句子中的特定实例,例如大卫·卡梅隆"
  4. 然后,我想检查每个句子中的正面/负面情绪,并据此进行计数

注意:我并不真正担心准确性,因为我的数据集很大,也不太担心讽刺.

NB: I am not really worried too much about accuracy because my data sets are large and also not worried too much about sarcasm.

这是我遇到的麻烦:

  1. 我可以找到的所有数据集NLTK arent随附的Web服务格式的语料库电影评论数据.看来这已经完成了一些处理.据我所知,斯坦福大学的处理是使用WEKA完成的. NLTK不可能独自完成所有这些工作吗?在这里,所有数据集已经被组织成正/负,例如极性数据集 http://www.cs.cornell.edu/People/pabo/movie-review-data/如何完成? (要根据情感来组织句子,肯定是WEKA吗?还是其他?)

  1. All the data sets I can find e.g. the corpus movie review data that comes with NLTK arent in webservice format. It looks like this has had some processing done already. As far as I can see the processing (by stanford) was done with WEKA. Is it not possible for NLTK to do all this on its own? Here all the data sets have already been organised into positive/negative already e.g. polarity dataset http://www.cs.cornell.edu/People/pabo/movie-review-data/ How is this done? (to organise the sentences by sentiment, is it definitely WEKA? or something else?)

我不确定我为什么会同时使用WEKA和NLTK.似乎他们做了很多相同的事情.如果我首先使用WEKA处理数据以找到情感,为什么我需要NLTK?有可能解释为什么这样做有必要吗?

I am not sure I understand why WEKA and NLTK would be used together. Seems like they do much the same thing. If im processing the data with WEKA first to find sentiment why would I need NLTK? Is it possible to explain why this might be necessary?

我发现有一些脚本可以完成此任务,但是所有脚本都使用相同的预处理数据.自己处理这些数据以找到句子中的情感,而不是使用链接中给出的数据样本,这是不可能的吗?

I have found a few scripts that get somewhat near this task, but all are using the same pre-processed data. Is it not possible to process this data myself to find sentiment in sentences rather than using the data samples given in the link?

非常感谢您的帮助,它将为我节省很多头发!

Any help is much appreciated and will save me much hair!

干杯柯

推荐答案

电影评论数据已被人类标记为正面或负面(进行评论的人给电影定级,该等级用于确定极性).这些黄金标准标签可让您训练分类器,然后将其用于其他电影评论.您可以使用该数据在NLTK中训练分类器,但是将结果应用于选举推文可能不如随机猜测肯定或否定准确.另外,您可以自己将几千条推文标记为正面或负面,并将其用作训练集.

The movie review data has already been marked by humans as being positive or negative (the person who made the review gave the movie a rating which is used to determine polarity). These gold standard labels allow you to train a classifier, which you could then use for other movie reviews. You could train a classifier in NLTK with that data, but applying the results to election tweets might be less accurate than randomly guessing positive or negative. Alternatively, you can go through and label a few thousand tweets yourself as positive or negative and use this as your training set.

有关使用朴素贝叶斯与NLTK进行情感分析的说明: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/

For a description of using Naive Bayes for sentiment analysis with NLTK: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/

然后在该代码中,而不是使用电影语料库,而使用您自己的数据来计算字数(在word_feats方法中).

Then in that code, instead of using the movie corpus, use your own data to calculate word counts (in the word_feats method).

这篇关于使用NLTK python使用样本数据或网络服务对句子进行情感分析?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆