是否有可能基于文本的结构来猜测用户的心情? [英] Is it possible to guess a user's mood based on the structure of text?

查看:99
本文介绍了是否有可能基于文本的结构来猜测用户的心情?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我承担了自然语言处理需要被用来分析文本本身,而是你有一种算法来检测的基础上的文字,他们已经写了使用者的心情有什么建议?我怀疑这将是非常准确的,但我仍然有兴趣仍然。

I assume a natural language processor would need to be used to parse the text itself, but what suggestions do you have for an algorithm to detect a user's mood based on text that they have written? I doubt it would be very accurate, but I'm still interested nonetheless.

编辑:我绝不是语言学或自然语言处理方面的专家,所以我很抱歉,如果这个问题太笼统或愚蠢

I am by no means an expert on linguistics or natural language processing, so I apologize if this question is too general or stupid.

推荐答案

这是自然语言处理的区域的基础上叫的情感分析。虽然你的问题是,一般情况下,它肯定不是愚蠢的 - 这种研究是由亚马逊上的文字,在产品的评论,例如做

This is the basis of an area of natural language processing called sentiment analysis. Although your question is general, it's certainly not stupid - this sort of research is done by Amazon on the text in product reviews for example.

如果你是认真的,那么一个简单的版本,可以通过以下方式实现 -

If you are serious about this, then a simple version could be achieved by -

  1. 收购积极/消极情绪语料库。如果这是一个专业的项目,你可能需要一些时间和手动标注语料库自己,但如果你是在赶时间,或只是想尝试这在第一次那么我建议在看的sentiment极性语料库从彭勃和莉莲·李的研究。与使用语料库的问题是它不适合您的域(具体而言,主体采用电影评论),但它仍然应该是适用的。

  1. Acquire a corpus of positive/negative sentiment. If this was a professional project you may take some time and manually annotate a corpus yourself, but if you were in a hurry or just wanted to experiment this at first then I'd suggest looking at the sentiment polarity corpus from Bo Pang and Lillian Lee's research. The issue with using that corpus is it is not tailored to your domain (specifically, the corpus uses movie reviews), but it should still be applicable.

拆分数据集成句子或正或负。对于情感倾向的语料库可以拆分每次审查到它的复合句,然后应用整体人气极性标记(正或负),所有这些句子。拆分这个文集分为两部分 - 90%,应该是培训,10%应该是检验。如果你使用的Weka那么它可以处理主体的分裂给你。

Split your dataset into sentences either Positive or Negative. For the sentiment polarity corpus you could split each review into it's composite sentences and then apply the overall sentiment polarity tag (positive or negative) to all of those sentences. Split this corpus into two parts - 90% should be for training, 10% should be for test. If you're using Weka then it can handle the splitting of the corpus for you.

应用机器学习算法(如SVM,朴素贝叶斯,最大熵)以词级别的训练语料。这种模式被称为袋的话模型,这是刚刚重​​新presenting句子作为也就是说,它的组成。这是相同的模型,许多垃圾邮件过滤器上运行。对于一个很好的介绍机器学习算法有称为应用程序 Weka的实现一系列的这些算法并为您提供一个图形用户界面和他们一起玩。然后,您可以测试从试图测试语料用这种模式进行分类时所犯的错误中了解到该机模型的性能。

Apply a machine learning algorithm (such as SVM, Naive Bayes, Maximum Entropy) to the training corpus at a word level. This model is called a bag of words model, which is just representing the sentence as the words that it's composed of. This is the same model which many spam filters run on. For a nice introduction to machine learning algorithms there is an application called Weka that implements a range of these algorithms and gives you a GUI to play with them. You can then test the performance of the machine learned model from the errors made when attempting to classify your test corpus with this model.

应用这种机器学习算法用户的帖子。对于每一个用户后,分离后成句子,然后用你的机器获悉模型分类。

Apply this machine learning algorithm to your user posts. For each user post, separate the post into sentences and then classify them using your machine learned model.

所以,是的,如果你是认真的话它是可以实现的 - 即使没有过去在计算语言学的经验。这将是一个工作相当,但即使有字为基础的模型很好的效果才能实现。

So yes, if you are serious about this then it is achievable - even without past experience in computational linguistics. It would be a fair amount of work, but even with word based models good results can be achieved.

如果您需要更多的帮助,请随时与我联系 - 我总是乐意帮助别人感兴趣的NLP =]

If you need more help feel free to contact me - I'm always happy to help others interested in NLP =]


小注的 -

  1. 在仅仅拆分文本段成句子是NLP的领域 - 所谓句子边界检测。有一些工具,开源软件或免费的,可以做到这一点,但你的任务的空格和标点的简单拆分​​应该是不错。
  2. SVMlight 是另一种机器学习来考虑,而事实上他们归纳的SVM做了类似的任务,我们正在看什么 - 试图划分其路透社的文章是关​​于企业并购与1000正面和负面的1000例
  3. 打开句子到功能分类上可以采取一些工作。在该模型中每个字是一个功能 - 这需要标记化的句子,这意味着彼此分离字和标点。另一个技巧是小写的所有单字记号,这样我恨你,我恨你无论最终会被认为是相同的。随着越来越多的数据,你可以尝试,也包括资本化是否有助于分类某人是否很生气,但我相信的话应该是足够的,至少对于最初的努力。
  1. Merely splitting a segment of text into sentences is a field of NLP - called sentence boundary detection. There are a number of tools, OSS or free, available to do this, but for your task a simple split on whitespaces and punctuation should be fine.
  2. SVMlight is also another machine learner to consider, and in fact their inductive SVM does a similar task to what we're looking at - trying to classify which Reuter articles are about "corporate acquisitions" with 1000 positive and 1000 negative examples.
  3. Turning the sentences into features to classify over may take some work. In this model each word is a feature - this requires tokenizing the sentence, which means separating words and punctuation from each other. Another tip is to lowercase all the separate word tokens so that "I HATE you" and "I hate YOU" both end up being considered the same. With more data you could try and also include whether capitalization helps in classifying whether someone is angry, but I believe words should be sufficient at least for an initial effort.


修改

我刚刚发现LingPipe,其实对情感分析一个教程使用波庞和莉莲·李情感倾向语料我在说什么。如果您使用Java可能是使用一个很好的工具,即使不是遍历所有的我上面所讨论的步骤。

I just discovered LingPipe that in fact has a tutorial on sentiment analysis using the Bo Pang and Lillian Lee Sentiment Polarity corpus I was talking about. If you use Java that may be an excellent tool to use, and even if not it goes through all of the steps I discussed above.

这篇关于是否有可能基于文本的结构来猜测用户的心情?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆