句子分类(分类) [英] Sentence Classification (Categorization)

查看:112
本文介绍了句子分类(分类)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读有关文本分类的文章,并发现了几种可用于分类的Java工具,但我仍然想知道:文本分类是否与句子分类相同!

I have been reading about text classification and found several Java tools which are available for classification, but I am still wondering: Is text classification the same as sentence classification!

是否有任何专注于句子分类的工具?

Is there any tool which focuses on sentence classification?

推荐答案

文本分类和句子分类之间没有正式的区别。毕竟,句子是一种文本。但一般来说,当人们谈论文本分类时,恕我直言,他们意味着更大的文本单位,如文章,评论或演讲。将政治家的演讲分为民主派或共和党人比分类推文要容易得多。如果每个实例都有大量文本,那么您不需要为每个训练实例提供所有可以提供给您的信息,并且可以通过一个单词的朴素贝叶斯模型获得相当不错的性能。

Theres no formal difference between 'Text classification' and 'Sentence classification'. After all, a sentence is a type of text. But generally, when people talk about text classification, IMHO they mean larger units of text such as an essay, review or speech. Classifying a politician's speech into democrat or republican is a lot easier than classifying a tweet. When you have a lot of text per instance, you don't need to squeeze each training instance for all the information it can give you and get pretty good performance out a bag-of-words naive-bayes model.

如果你在句子语料库中抛出现成的weka分类器,基本上你可能得不到所需的性能数字。您可能需要使用POS标签,解析树,单词排序,ngrams等来增加句子中的数据。还可以获得任何相关的元数据,例如创建时间,创建位置,句子作者的属性等。显然,所有这些都取决于你究竟想要分类的是什么......为你准备的功能需要对手头的问题有直觉意义。

Basically you might not get the required performance numbers if you throw off-the-shelf weka classifiers at a corpora of sentences. You might have to augment the data in the sentence with POS tags, parse trees, word ordering, ngrams, etc. Also get any related metadata such as creation time, creation location, attributes of sentence author, etc. Obviously all of this depends on what exactly are you trying to classify.. the features that will work out for you need to be intuitively meaningful to the problem at hand.

这篇关于句子分类(分类)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆