一般选择哪种机器学习分类器? [英] Which machine learning classifier to choose, in general?

查看:24
本文介绍了一般选择哪种机器学习分类器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我正在处理一些分类问题.(欺诈检测和垃圾评论是我目前正在研究的两个问题,但我对任何一般的分类任务都很好奇.)

我怎么知道我应该使用哪个分类器?

  1. 决策树
  2. 支持向量机
  3. 贝叶斯
  4. 神经网络
  5. K-最近邻
  6. Q 学习
  7. 遗传算法
  8. 马尔科夫决策过程
  9. 卷积神经网络
  10. 线性回归或逻辑回归
  11. Boosting、bagging、集成
  12. 随机爬山或模拟退火
  13. ...

在哪些情况下是自然"首选之一,选择那个的原则是什么?

我正在寻找的答案类型示例(来自 Manning 等人的

Suppose I'm working on some classification problem. (Fraud detection and comment spam are two problems I'm working on right now, but I'm curious about any classification task in general.)

How do I know which classifier I should use?

  1. Decision tree
  2. SVM
  3. Bayesian
  4. Neural network
  5. K-nearest neighbors
  6. Q-learning
  7. Genetic algorithm
  8. Markov decision processes
  9. Convolutional neural networks
  10. Linear regression or logistic regression
  11. Boosting, bagging, ensambling
  12. Random hill climbing or simulated annealing
  13. ...

In which cases is one of these the "natural" first choice, and what are the principles for choosing that one?

Examples of the type of answers I'm looking for (from Manning et al.'s Introduction to Information Retrieval book):

a. If your data is labeled, but you only have a limited amount, you should use a classifier with high bias (for example, Naive Bayes).

I'm guessing this is because a higher-bias classifier will have lower variance, which is good because of the small amount of data.

b. If you have a ton of data, then the classifier doesn't really matter so much, so you should probably just choose a classifier with good scalability.

  1. What are other guidelines? Even answers like "if you'll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent" are good. I care less about implementation/library issues, though.

  2. Also, for a somewhat separate question, besides standard Bayesian classifiers, are there 'standard state-of-the-art' methods for comment spam detection (as opposed to email spam)?

解决方案

First of all, you need to identify your problem. It depends upon what kind of data you have and what your desired task is.

If you are Predicting Category :

  • You have Labeled Data
    • You need to follow Classification Approach and its algorithms
  • You don't have Labeled Data
    • You need to go for Clustering Approach

If you are Predicting Quantity :

  • You need to go for Regression Approach

Otherwise

  • You can go for Dimensionality Reduction Approach

There are different algorithms within each approach mentioned above. The choice of a particular algorithm depends upon the size of the dataset.

Source: http://scikit-learn.org/stable/tutorial/machine_learning_map/

这篇关于一般选择哪种机器学习分类器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆