具有top-k输出的大规模朴素贝叶斯分类器 [英] Large scale naïve Bayes classifier with top-k output
问题描述
我需要一个大规模的朴素贝叶斯库,其中包含数百万个训练示例和+ 100k二进制功能.它必须是在线版本(培训后可更新).我还需要top-k输出,即单个实例的多个分类.准确性不是很重要.
I need a library for naïve Bayes large scale, with millions of training examples and +100k binary features. It must be an online version (updatable after training). I also need top-k output, that is multiple classifications for a single instance. Accuracy is not very important.
目的是提供一种自动文本分类应用程序.
The purpose is an automatic text categorization application.
任何关于好的图书馆的建议都将受到赞赏.
Any suggestions for a good library is very appreciated.
该库最好使用Java.
The library should preferably be in Java.
推荐答案
如果还接受除朴素贝叶斯之外的学习算法,请查看 Vowpal Wabbit (C ++),其声誉是最好的可扩展文本分类算法之一(在线随机梯度下降+ LDA).我不确定它是否输出top-K.
If a learning algorithm other than naïve Bayes is also acceptable, then check out Vowpal Wabbit (C++), which has the reputation of being one of the best scalable text classification algorithms (online stochastic gradient descent + LDA). I'm not sure if it does top-K output.
这篇关于具有top-k输出的大规模朴素贝叶斯分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!