稀疏数据的离散和连续分类器 [英] Discrete and Continuous Classifier on Sparse Data

查看:72
本文介绍了稀疏数据的离散和连续分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对一个包含离散和连续特征的示例进行分类.另外,该示例表示稀疏数据,因此,即使系统可能已针对100个特征进行了训练,该示例也可能只有12个.

I'm trying to classify an example, which contains discrete and continuous features. Also, the example represents sparse data, so even though the system may have been trained on 100 features, the example may only have 12.

用于实现此目的的最佳分类器算法是什么?我一直在研究Bayes,Maxent,Decision Tree和KNN,但不确定是否完全适合.我发现的最大障碍是,大多数实现都不支持稀疏数据集,既不支持离散功能,又不支持连续功能.谁能推荐符合这些条件的算法和实现(最好在Python中)?

What would be the best classifier algorithm to use to accomplish this? I've been looking at Bayes, Maxent, Decision Tree, and KNN, but I'm not sure any fit the bill exactly. The biggest sticking point I've found is that most implementations don't support sparse data sets and both discrete and continuous features. Can anyone recommend an algorithm and implementation (preferably in Python) that fits these criteria?

到目前为止,我看过的图书馆包括:

Libraries I've looked at so far include:

  1. 橙色(主要是学术性的.实现方式并非十分有效或不实用.)
  2. NLTK (也具有学术意义,尽管具有良好的Maxent实现,但不能处理连续功能. )
  3. Weka (仍在对此进行研究.似乎支持广泛范围广泛的算法,但是文档不多,因此尚不清楚每个实现都支持什么.)
  1. Orange (Mostly academic. Implementations not terribly efficient or practical.)
  2. NLTK (Also academic, although has a good Maxent implementation, but doesn't handle continuous features.)
  3. Weka (Still researching this. Seems to support a broad range of algorithms, but has poor documentation, so it's unclear what each implementation supports.)

推荐答案

Weka (Java)满足您的所有要求:

Weka (Java) satisfies all you requirements:

  • a large number of classification/regression algorithms
  • support for discrete/continous (called nominal/numeric in Weka) attributes
  • handles sparse data: ARFF format

查看此 Pentaho Wiki 文档,指南,视频教程等的链接列表...

Check out this Pentaho wiki for a list of links to documentations, guides, video tutorials, etc ...

这篇关于稀疏数据的离散和连续分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆