从文本的文本分类提取物标签 [英] Text classification extract tags from text

查看:282
本文介绍了从文本的文本分类提取物标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多的文本数据的Lucene索引,每个项目都有一个说明,我想提取的描述中较常见的单词,并产生标签基础上,说明每个项目进行分类,有没有lucene.net库这样做,或任何其他库的文本分类?

I have a lucene index with a lot of text data, each item has a description, I want to extract the more common words from the description and generate tags to classify each item based on the description, is there a lucene.net library for doing this or any other library for text classification?

推荐答案

没有,l​​ucene.net可以使搜索,索引,文字规范化,发现更多这样的funtionalty,而不是文本分类。

No, lucene.net can make search, index, text normalization, "find more like this" funtionalty, but not a text classification.

什么建议,你要看您的需求。所以,也许更说明需要。 但是,一般来说,最容易的方式尝试使用外部服务。所有的外部服务有REST API,它很容易使用C#与它进行交互。

What to suggest to you depends from your requirements. So, maybe more description needed. But, generally, easiest way try to use external services. All external services have REST API, and it's very easy to interact with it using C#.

从外部服务:

  • Open Calais
  • uClassify
  • Google Prediction API
  • Text Classify
  • Alchemy API

也有优秀的Java SDK喜欢象夫。我记得有Mahout的相互作用,可以也做了类似的服务,所以它的集成是没有问题的。

Also there good Java SDK like Mahout. As I remember interactions with Mahout could be also done like with service, so integration with it is not a problem at all.

我不得不使用C#类似的自动标记的任务,而我使用的是开放加莱。它是免费的,以使每天5万次交易。这是对我来说足够。此外uClassify具有良好的定价,例如作为独立99 $每年的许可证。

I had similar "auto tagging" task using c#, and I've used for that Open Calais. It's free to make 50,000 transactions per day. It was enough for me. Also uClassify has good pricing, as example "Indie" license 99$ per year.

不过,也许外部服务和Mahout的是不是你的方式。比来看看 DBpedia的项目和RDF。 而最后,你可以用朴素贝叶斯算法的一些实现,至少。这很容易,一切都将成为你的控制之下。

But maybe external services and Mahout is not your way. Than take a look at DBpedia project and RDF. And the last, you can use some implementations of Naive Bayes algorithm, at least. It's easy, and all will be under your control.

这篇关于从文本的文本分类提取物标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆