skip-gram 模型的输出会是什么样的? [英] What would the output of skip-gram model look like?

查看:62
本文介绍了skip-gram 模型的输出会是什么样的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,skip-gram 模型的输出必须与许多训练标签进行比较(取决于窗口大小)

我的问题是:skip-gram 模型的最终输出看起来像这张图中的描述吗?

附言.我能找到的最相似的问题:[1]什么是skip-gram 中的多个输出是什么意思?

解决方案

很难回答应该"什么发生在退化/玩具/人工情况下,特别是考虑到在实际初始化/训练中使用了多少随机性.

模型的内部权重投影层"(又名输入向量"或词向量")都通过反向传播进行了更改.因此,如果不考虑初始化,就无法回答内部权重应该是什么.更新投影权重.与比模型状态可以近似的样本多得多"相反,只有两个训练示例没有任何意义.

如果您认为自己构建了一个在运行时提供信息的小案例,我建议您将其与实际实现进行对比,看看会发生什么.

但要注意:微型模型和训练集很可能很奇怪,或者允许多个/过度拟合/特殊的最终状态,在这种方式下,算法在以预期方式使用时的行为方式并没有太多揭示 - 在大量不同的训练数据上.

To my understanding, the output of the skip-gram model must be compared with many training labels (depending on the window size)

My question is: Does the final output of the skip-gram model look like the description in this picture?

Ps. the most similar question I can find:[1]What does the multiple outputs in skip-gram mean?

解决方案

It's hard to answer about what "should" happen in degenerate/toy/artificial cases, especially given how much randomness is used in the actual initialization/training.

Both the model's internal weights and the 'projection layer' (aka 'input vectors' or just 'word vectors') are changed by backpropagation. So it can't be answered what the internal-weights should be without also considering the intialization & updates to projection-weights. And nothing is meaningful with only two training examples, as opposed to "many many more examples than coudl be approximated by the model's state".

If you think you've constructed a tiny case that's informative when run, I'd suggest trying it against actual implementations to see what happens.

But beware: tiny models & training sets are likely to be weird, or allow for multiple/overfit/idiosyncratic end-states, in ways that don't reveal much about how the algorithm behaves when used in its intended fashion – on large varied amounts of training data.

这篇关于skip-gram 模型的输出会是什么样的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆