通常,选择哪个机器学习分类器? [英] Which machine learning classifier to choose, in general?

查看:99
本文介绍了通常,选择哪个机器学习分类器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我正在处理一些分类问题. (欺诈检测和垃圾评论是我目前正在研究的两个问题,但我对总体上的任何分类任务感到好奇.)

Suppose I'm working on some classification problem. (Fraud detection and comment spam are two problems I'm working on right now, but I'm curious about any classification task in general.)

我怎么知道应该使用哪个分类器?

How do I know which classifier I should use?

  1. 决策树
  2. SVM
  3. 贝叶斯
  4. 神经网络
  5. K近邻
  6. Q学习
  7. 遗传算法
  8. 马尔可夫决策过程
  9. 卷积神经网络
  10. 线性回归或逻辑回归
  11. 增压,装袋,诱人
  12. 随机爬山或模拟退火
  13. ...
  1. Decision tree
  2. SVM
  3. Bayesian
  4. Neural network
  5. K-nearest neighbors
  6. Q-learning
  7. Genetic algorithm
  8. Markov decision processes
  9. Convolutional neural networks
  10. Linear regression or logistic regression
  11. Boosting, bagging, ensambling
  12. Random hill climbing or simulated annealing
  13. ...

在哪些情况下是自然"的首选,选择该原则的原则是什么?

In which cases is one of these the "natural" first choice, and what are the principles for choosing that one?

我正在寻找答案类型的示例(摘自Manning等人的

Examples of the type of answers I'm looking for (from Manning et al.'s Introduction to Information Retrieval book):

a. 如果您的数据带有标签,但数量有限,则应使用偏差较大的分类器(例如,朴素贝叶斯).

我猜这是因为偏高的分类器具有较低的方差,这很好,因为数据量少.

I'm guessing this is because a higher-bias classifier will have lower variance, which is good because of the small amount of data.

b. 如果您有大量数据,则分类器实际上并不重要,因此您可能应该只选择具有良好可伸缩性的分类器.

  1. 还有哪些其他准则?即使是诸如如果您必须向某些高级管理人员解释模型,那么您也应该使用决策树,因为决策规则是相当透明的"之类的答案是很好的.不过,我不太关心实现/库问题.

  1. What are other guidelines? Even answers like "if you'll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent" are good. I care less about implementation/library issues, though.

还有一个单独的问题,除了标准的贝叶斯分类器之外,是否还有标准的最新技术"用于检测垃圾邮件(与电子邮件垃圾邮件相对)?

Also, for a somewhat separate question, besides standard Bayesian classifiers, are there 'standard state-of-the-art' methods for comment spam detection (as opposed to email spam)?

推荐答案

首先,您需要确定您的问题.这取决于您拥有什么样的数据以及您想要完成的任务.

First of all, you need to identify your problem. It depends upon what kind of data you have and what your desired task is.

如果您是Predicting Category:

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆