skip-gram 模型的输出会是什么样的? [英] What would the output of skip-gram model look like?
问题描述
据我所知,skip-gram 模型的输出必须与许多训练标签进行比较(取决于窗口大小)
我的问题是:skip-gram 模型的最终输出看起来像这张图中的描述吗?
附言.我能找到的最相似的问题:[1]什么是skip-gram 中的多个输出是什么意思?
很难回答应该"什么发生在退化/玩具/人工情况下,特别是考虑到在实际初始化/训练中使用了多少随机性.
模型的内部权重和投影层"(又名输入向量"或词向量")都通过反向传播进行了更改.因此,如果不考虑初始化,就无法回答内部权重应该是什么.更新投影权重.与比模型状态可以近似的样本多得多"相反,只有两个训练示例没有任何意义.
如果您认为自己构建了一个在运行时提供信息的小案例,我建议您将其与实际实现进行对比,看看会发生什么.
但要注意:微型模型和训练集很可能很奇怪,或者允许多个/过度拟合/特殊的最终状态,在这种方式下,算法在以预期方式使用时的行为方式并没有太多揭示 - 在大量不同的训练数据上.>
To my understanding, the output of the skip-gram model must be compared with many training labels (depending on the window size)
My question is: Does the final output of the skip-gram model look like the description in this picture?
Ps. the most similar question I can find:[1]What does the multiple outputs in skip-gram mean?
It's hard to answer about what "should" happen in degenerate/toy/artificial cases, especially given how much randomness is used in the actual initialization/training.
Both the model's internal weights and the 'projection layer' (aka 'input vectors' or just 'word vectors') are changed by backpropagation. So it can't be answered what the internal-weights should be without also considering the intialization & updates to projection-weights. And nothing is meaningful with only two training examples, as opposed to "many many more examples than coudl be approximated by the model's state".
If you think you've constructed a tiny case that's informative when run, I'd suggest trying it against actual implementations to see what happens.
But beware: tiny models & training sets are likely to be weird, or allow for multiple/overfit/idiosyncratic end-states, in ways that don't reveal much about how the algorithm behaves when used in its intended fashion – on large varied amounts of training data.
这篇关于skip-gram 模型的输出会是什么样的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!