Twitter情绪分析技术 [英] Twitter sentiment analysis technics

查看：121 发布时间：2020/5/18 1:17:26 python nltk

本文介绍了Twitter情绪分析技术的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在做一个关于Twitter情绪分析的项目，但是我需要考虑一些事情.

I'm doing a project on twitter sentiment analysis but there're some things I ponder over.

由于推文非常短(少于140个字符)，因此哪种文本分析技术最适用.例如.词干功能是否与-let说长篇文章一样有效?

Since tweets are extremely short (less than 140 chars) what text analysis technics apply best. For example. Does stemming work as well as in -let's say- long articles?

n元语法怎么样?推文的简短对他们来说是最好还是最坏?

What about n-grams? Does the shortness of the tweet make it best or worst for the them?

k最近是否比语音标记的一部分更准确?

Would k-nearest be more accurate than part of speech tagging?

随着时间的流逝，我的自定义twitter数据集会变得无关紧要/损坏吗?由于Twitter及其相关信息变化如此之快，这也是我的主要担忧.

Will my custom twitter dataset become irrelevant/corrupt as time goes by? Since twitter and the info on it changes so fast that also a major concern for me.

非常感谢您的时间.

PS:您是否牢记任何良好的Twitter情绪数据集?如果它定期更新，那就太好了.

PS: Do you have in mind any good twitter sentiment dataset? Would be great if it updates regularly.

推荐答案

我做了一些课堂分析，分析名人推文并比较它们的相似性.

I did some classwork analyzing celebrities tweets and comparing their similarities.

您想到的最大的事情是一条tweet的长度.在140个字符的情况下，许多单词会被缩短，或者是不寻常的"txt语音".因此，即使是众所周知的词干，例如 Porter 也会给出一些奇怪的结果.最好保留几乎所有内容，并且仅在字数，向量等之后才归一化.

The biggest thing, which you figured, is the length of a tweet. At 140 chars a lot of words are shortened, or unusual "txt-speech". So even a well know stemmer such as Porter is going to give some odd results. It was best to keep almost everything and only normalize after words counts, vectors, etc.

对于单词的推断，n-gram和以下链接是进行质量推断的重要因素.我只能忍受4克的空间和时间要求，但是即使创建简单的2克也可以带来很大的进步.

For extrapolating from the words, n-grams and following links are a big factor for quality inference. I could only tolerate the space and time requirements of 4-grams, but even creating simple 2-grams gave a large improvement.

如果您注意到我之前说过几乎所有内容".在我仅关注流行的名人推文的情况下，我遇到了一个问题，即他们的很多推文都是与他们的活动或赞助商的链接或喊叫声.所以很大一部分是删除了大量的垃圾邮件副本.

If you noticed I said earlier "almost everything". In my case of following only popular celeb tweets, I ran into the problem that alot of their tweets were links or shout outs to their events, or sponsors, etc. So a big part was removing the large duplicates of spam.

对于提取准确情绪的方法或您要寻找的任何量度方法，我将首先尝试基于朴素贝叶斯的方法.对于基线而言，它是简单且相对准确的. K均值会做得很好，但请记住，它不考虑方差和协方差，但是仍然可以尝试使用另一个基准.

For the methods to extract accurate sentiment or whatever measures your looking for, I would first try naive bayes based methods. It is simple and relatively accurate for a baseline. K-means will do fairly well but remember that it does not take into account variances and co-variances, but nonetheless is another baseline to try.

希望能提供一些见识.

这篇关于Twitter情绪分析技术的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Twitter情绪分析技术 [英] Twitter sentiment analysis technics

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Twitter情绪分析技术 [英] Twitter sentiment analysis technics

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭