Vowpal Wabbit中逻辑回归的正确性? [英] Correctness of logistic regression in Vowpal Wabbit?

查看:73
本文介绍了Vowpal Wabbit中逻辑回归的正确性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经开始使用 Vowpal Wabbit 进行逻辑回归,但是我无法重现其给出的结果.也许有一些没有记载的魔术",但是有人能复制/验证/检查逻辑回归的计算吗?

I have started using Vowpal Wabbit for logistic regression, however I am unable to reproduce the results it gives. Perhaps there is some undocumented "magic" it does, but has anyone been able to replicate / verify / check the calculations for logistic regression?

例如,使用下面的简单数据,我们旨在对age预测label的方式进行建模.显然,随着年龄的增长,存在密切的关系,观察到1的概率也会增加.

For example, with the simple data below, we aim to model the way age predicts label. It is obvious there is a strong relationship as when age increases the probability of observing 1 increases.

作为一个简单的单元测试,我使用了以下12行数据:

As a simple unit test, I used the 12 rows of data below:

age label
20  0
25  0
30  0
35  0
40  0
50  0
60  1
65  0
70  1
75  1
77  1
80  1

现在,使用 R SPSS 或什至用手对此数据集执行逻辑回归,将生成一个类似于L = 0.2294*age - 14.08的模型.因此,如果我降低年龄,并使用logit变换prob = 1/(1 + EXP(-L)),我可以获得的预测概率范围从第一行的0.0001到最后一行的0.9864 ,正如合理预期的那样.

Now, performing a logistic regression on this dataset, using R , SPSS or even by hand, produces a model which looks like L = 0.2294*age - 14.08. So if I substitude the age, and use the logit transform prob=1/(1+EXP(-L)) I can obtain the predicted probabilities which range from 0.0001 for the first row, to 0.9864 for the last row, as reasonably expected.

如果我在 Vowpal Wabbit 中插入相同的数据,

If I plug in the same data in Vowpal Wabbit,

-1 'P1 |f age:20
-1 'P2 |f age:25
-1 'P3 |f age:30
-1 'P4 |f age:35
-1 'P5 |f age:40
-1 'P6 |f age:50
1 'P7 |f age:60
-1 'P8 |f age:65
1 'P9 |f age:70
1 'P10 |f age:75
1 'P11 |f age:77
1 'P12 |f age:80

然后使用

vw -d data.txt -f demo_model.vw --loss_function logistic --invert_hash aaa

(与一致的命令行>如何使用vowpal wabbit执行逻辑回归在非常不平衡的数据集上),我获得了一个模型L= -0.00094*age - 0.03857,它非常不同.

(command line consistent with How to perform logistic regression using vowpal wabbit on very imbalanced dataset ) , I obtain a model L= -0.00094*age - 0.03857 , which is very different.

使用-r-p获得的预测值进一步证实了这一点.最终得出的概率几乎完全相同,例如,年龄= 20的0.4857和年龄= 80的0.4716,这相差甚远.

The predicted values obtained using -r or -p further confirm this. The resulting probabilities end up nearly all the same, for example 0.4857 for age=20, and 0.4716 for age=80, which is extremely off.

我也注意到与较大的数据集存在这种矛盾. Vowpal Wabbit在什么意义上进行逻辑回归不同,如何解释结果?

I have noticed this inconsistency with larger datasets too. In what sense is Vowpal Wabbit carrying out the logistic regression differently, and how are the results to be interpreted?

推荐答案

这是对vowpal兔子的常见误解.

This is a common misunderstanding of vowpal wabbit.

一个人不能将批处理学习与在线学习进行比较.

One cannot compare batch learning with online learning.

vowpal wabbit不是批处理学习者.它是一个在线学习器.在线学习者一次只看一个例子就可以学习,然后稍微调整模型的权重.

vowpal wabbit is not a batch learner. It is an online learner. Online learners learn by looking at examples one at a time and slightly adjusting the weights of the model as they go.

在线学习有优点也有缺点.不利之处在于,最终模型的收敛是缓慢的/渐进的.学习者在从每个示例中提取信息时不会做完美"的工作,因为该过程是迭代的.故意限制/缓慢收敛最终结果.这会使在线学习者在像上面这样的小数据集上显得虚弱.

There are advantages and disadvantages to online learning. The downside is that convergence to the final model is slow/gradual. The learner doesn't do a "perfect" job at extracting information from each example, because the process is iterative. Convergence on a final result is deliberately restrained/slow. This can make online learners appear weak on tiny data-sets like the above.

尽管有几个好处:

  • 在线学习者无需将全部数据加载到内存中(他们可以一次检查一个示例,然后根据实时观察到的每个示例损失对模型进行调整),因此他们可以轻松扩展到数十亿个例子. 4 Yahoo!撰写的2011年论文.研究人员描述了如何使用vowpal wabbit在1k个节点上在1小时内从Tera(10 ^ 12)特征数据集中学习.用户经常使用vw从台式机和笔记本电脑上的数十亿个示例数据集中学习.
  • 在线学习是自适应的,可以跟踪条件随时间的变化,因此它可以从非平稳数据中学习,例如针对自适应对手的学习.
  • 学习自省:一个可以观察到损耗收敛进行培训时评分,并找出具体问题,甚至可以从特定的数据集示例或功能中获得重要的见识.
  • 在线学习者可以渐进式学习,因此用户可以混合使用带标签和不带标签的示例,以在进行预测的同时继续学习.
  • 即使在培训期间,估计的错误也始终是样本外",这是很好地估计了测试错误.无需将数据拆分为训练和测试子集,也无需执行N向交叉验证.下一个(尚未看到)的示例始终用作保留.从操作方面来说,这是优于批处理方法的巨大优势.它大大简化了典型的机器学习过程.此外,只要您不对数据进行多次遍历,它就可以作为避免过度拟合的良好机制.
  • Online learners don't need to load the full data into memory (they work by examining one example at a time and adjusting the model based on the real-time observed per-example loss) so they can scale easily to billions of examples. A 2011 paper by 4 Yahoo! researchers describes how vowpal wabbit was used to learn from a tera (10^12) feature data-set in 1 hour on 1k nodes. Users regularly use vw to learn from billions of examples data-sets on their desktops and laptops.
  • Online learning is adaptive and can track changes in conditions over time, so it can learn from non-stationary data, like learning against an adaptive adversary.
  • Learning introspection: one can observe loss convergence rates while training and identify specific issues, and even gain significant insights from specific data-set examples or features.
  • Online learners can learn in an incremental fashion so users can intermix labeled and unlabeled examples to keep learning while predicting at the same time.
  • The estimated error, even during training, is always "out-of-sample" which is a good estimate of the test error. There's no need to split the data into train and test subsets or perform N-way cross-validation. The next (yet unseen) example is always used as a hold-out. This is a tremendous advantage over batch methods from the operational aspect. It greatly simplifies the typical machine-learning process. In addition, as long as you don't run multiple-passes over the data, it serves as a great over-fitting avoidance mechanism.

在线学习者对示例顺序非常敏感.对于在线学习者来说,最糟糕的排序是将类聚类在一起(所有或几乎所有-1首先出现,然后出现所有1),就像上面的示例一样.因此,要从像vowpal wabbit这样的在线学习者中获得更好的结果,首先要做的是统一地对1-1进行随机排序(或者按时间顺序排序,因为示例通常出现在现实生活中).

Online learners are very sensitive to example order. The worst possible order for an online learner is when classes are clustered together (all, or almost all, -1s appear first, followed by all 1s) like the example above does. So the first thing to do to get better results from an online learner like vowpal wabbit, is to uniformly shuffle the 1s and -1s (or simply order by time, as the examples typically appear in real-life).

现在好吗?

问:有什么方法可以产生合理的模型,即使用在线学习器可以对小数据给出合理的预测?

Q: Is there any way to produce a reasonable model in the sense that it gives reasonable predictions on small data when using an online learner?

A:是的,有!

A: Yes, there is!

您可以通过两个简单的步骤来模拟批处理学习者的工作更紧密:

You can emulate what a batch learner does more closely, by taking two simple steps:

  • 均匀混洗 1-1示例.
  • 对数据进行 多次通过 ,以使学习者有机会收敛
  • Uniformly shuffle 1 and -1 examples.
  • Run multiple passes over the data to give the learner a chance to converge

注意:如果多次运行直到错误变为0,则存在过度拟合的危险.在线学习者已经很好地学习了您的示例,但是对于看不见的数据可能无法很好地概括.

Caveat: if you run multiple passes until error goes to 0, there's a danger of over-fitting. The online learner has perfectly learned your examples, but it may not generalize well to unseen data.

这里的第二个问题是vw给出的预测没有逻辑函数转换(这很不幸).它们类似于与中点的标准偏差(在[-50,50]处截断).您需要通过utl/logistic(在源树中)通过管道传递预测以获取带符号的概率.请注意,这些有符号概率在[-1,+1]范围内,而不是[0,1].您可以使用logistic -0而不是logistic来将它们映射到[0,1]范围.

The second issue here is that the predictions vw gives are not logistic-function transformed (this is unfortunate). They are akin to standard deviations from the middle point (truncated at [-50, 50]). You need to pipe the predictions via utl/logistic (in the source tree) to get signed probabilities. Note that these signed probabilities are in the range [-1, +1] rather than [0, 1]. You may use logistic -0 instead of logistic to map them to a [0, 1] range.

因此,鉴于以上所述,这是一个可以为您带来更多预期结果的食谱:

So given the above, here's a recipe that should give you more expected results:

# Train:
vw train.vw -c --passes 1000 -f model.vw --loss_function logistic --holdout_off


# Predict on train set (just as a sanity check) using the just generated model:
vw -t -i model.vw train.vw -p /dev/stdout | logistic | sort -tP -n -k 2

为您的数据集提供更多预期结果:

Giving this more expected result on your data-set:

-0.95674145247658 P1
-0.930208359811439 P2
-0.888329575506748 P3
-0.823617739247262 P4
-0.726830630992614 P5
-0.405323815830325 P6
0.0618902961794472 P7
0.298575998150221 P8
0.503468453150847 P9
0.663996516371277 P10
0.715480084449868 P11
0.780212725426778 P12

您可以通过增加/减少通过次数来使结果更多/更少极化(年龄较大时更接近1,而年龄较小时更接近-1).您可能还对以下培训选项感兴趣:

You could make the results more/less polarized (closer to 1 on the older ages and closer to -1 on the younger) by increasing/decreasing the number of passes. You may also be interested in the following options for training:

--max_prediction <arg>     sets the max prediction to <arg>
--min_prediction <arg>     sets the min prediction to <arg>
-l <arg>                   set learning rate to <arg>

例如,通过将学习率从默认的0.5增加到一个较大的数字(例如10),可以在训练较小的数据集时强制vw更快地收敛,从而需要较少的通过次数在那里.

For example, by increasing the learning rate from the default 0.5 to a large number (e.g. 10) you can force vw to converge much faster when training on small data-sets thus requiring less passes to get there.

更新

从2014年中期开始,vw不再需要外部logistic实用程序将预测映射回[0,1]范围.新的--link logistic选项将预测映射到逻辑函数[0,1]范围.类似地,--link glf1将预测映射到广义逻辑函数[-1,1]范围.

As of mid 2014, vw no longer requires the external logistic utility to map predictions back to [0,1] range. A new --link logistic option maps predictions to the logistic function [0, 1] range. Similarly --link glf1 maps predictions to a generalized logistic function [-1, 1] range.

这篇关于Vowpal Wabbit中逻辑回归的正确性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆