人工智能与随机性机器学习 [英] Randomness in Artificial Intelligence & Machine Learning

查看:72
本文介绍了人工智能与随机性机器学习的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题在我从事AI和ML的2个项目时浮现在脑海.如果我正在构建模型(例如,分类神经网络,K-NN等),并且该模型使用了某些功能(包括随机性),该怎么办?如果我不修复种子,那么每次在相同的训练数据上运行算法时,我将获得不同的准确性结果.但是,如果我修复它,那么其他设置可能会带来更好的结果.

This question came to my mind while working on 2 projects in AI and ML. What If I'm building a model (e.g. Classification Neural Network,K-NN, .. etc) and this model uses some function that includes randomness. If I don't fix the seed, then I'm going to get different accuracy results every time I run the algorithm on the same training data. However, If I fix it then some other setting might give better results.

平均一组精度是否足以表明该模型的准确性为xx%?

Is averaging a set of accuracies enough to say that the accuracy of this model is xx % ?

我不确定这是否是提出此类问题/进行此类讨论的合适地方.

I'm not sure If this is the right place to ask such a question/open such a discussion.

推荐答案

有些模型自然依赖于随机性(例如,随机森林),有些模型仅将随机性用作探索空间的一部分(例如,值的初始化)神经网络),但实际上具有定义明确,确定性的目标函数.

There are models which are naturally dependent on randomness (e.g., random forests) and models which only use randomness as part of exploring the space (e.g., initialisation of values for neural networks), but actually have a well-defined, deterministic, objective function.

对于第一种情况,您将需要使用多个种子并报告平均准确度std.偏差,以及您获得的最小值.如果您有办法重现这一点,通常会很好,所以只需使用多个固定种子即可.

For the first case, you will want to use multiple seeds and report average accuracy, std. deviation, and the minimum you obtained. It is often good if you have a way to reproduce this, so just use multiple fixed seeds.

对于第二种情况,您总是可以仅根据训练数据就知道哪种跑步方式是最好的(尽管实际上可能不是那种能提供最佳测试精度的方式!).因此,如果您有时间,最好说10次,然后对训练误差最大的那一个进行评估(或验证错误,仅从不 testing ).您可以升级并进行多次多次运行,也可以获得标准偏差.但是,如果您发现这很重要,则可能意味着您没有尝试足够的初始化,或者您没有为数据使用正确的模型.

For the second case, you can always tell, just on the training data, which run is best (although it might actually not be the one which gives you the best test accuracy!). Thus, if you have the time, it is good to do say, 10 runs, and then evaluate on the one with the best training error (or validation error, just never evaluate on testing for this decision). You can go a level up and do multiple multiple runs and get a standard deviation too. However, if you find that this is significant, it probably means you weren't trying enough initialisations or that you are not using the right model for your data.

这篇关于人工智能与随机性机器学习的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆