经过多次培训之后,我的Doc2Vec代码没有得到很好的结果.可能是什么问题? [英] My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong?
问题描述
我正在使用以下代码训练Doc2Vec
模型,其中tagged_data
是我之前设置的TaggedDocument
实例的列表:
I'm training a Doc2Vec
model using the below code, where tagged_data
is a list of TaggedDocument
instances I set up before:
max_epochs = 40
model = Doc2Vec(alpha=0.025,
min_alpha=0.001)
model.build_vocab(tagged_data)
for epoch in range(max_epochs):
print('iteration {0}'.format(epoch))
model.train(tagged_data,
total_examples=model.corpus_count,
epochs=model.iter)
# decrease the learning rate
model.alpha -= 0.001
# fix the learning rate, no decay
model.min_alpha = model.alpha
model.save("d2v.model")
print("Model Saved")
稍后我检查模型结果时,它们并不理想.可能出了什么问题?
When I later check the model results, they're not good. What might have gone wrong?
推荐答案
请勿在试图执行alpha
算术的循环中多次调用.train()
.
Do not call .train()
multiple times in your own loop that tries to do alpha
arithmetic.
这是不必要的,而且容易出错.
It's unnecessary, and it's error-prone.
具体来说,在上面的代码中,将原始0.025
alpha减少40个倍数会导致(0.025 - 40*0.001
)-0.015
最终alpha
,这对于许多训练时期都是负数.但是alpha
学习率为负是没有道理的:它本质上是要求模型在错误方向上微调其预测,而不是在错误方向上微调其预测.在每次批量培训更新中,正确方向. (此外,由于model.iter
默认为5,因此上述代码实际上执行了40 * 5
培训通过– 200
–这可能不是有意识的意图.但这只会使代码阅读者和缓慢的培训感到困惑,并非完全破坏性的结果,例如alpha
处理不当.)
Specifically, in the above code, decrementing the original 0.025
alpha by 0.001
forty times results in (0.025 - 40*0.001
) -0.015
final alpha
, which would also have been negative for many of the training epochs. But a negative alpha
learning-rate is nonsensical: it essentially asks the model to nudge its predictions a little bit in the wrong direction, rather than a little bit in the right direction, on every bulk training update. (Further, since model.iter
is by default 5, the above code actually performs 40 * 5
training passes – 200
– which probably isn't the conscious intent. But that will just confuse readers of the code & slow training, not totally sabotage results, like the alpha
mishandling.)
这里也有其他常见的错误变体.如果将alpha
减量0.0001
,则40减量只会将最终的alpha
减少为0.021
–而这种具有线性学习速率衰减的SGD(随机梯度下降)样式的正确做法是值以非常接近0.000
"结束).如果用户开始使用max_epochs
进行修改–毕竟,这是一个最重要的参数! –但是也不要每次都调整减量,它们可能会远远低于或低于0.000
.
There are other variants of error that are common here, as well. If the alpha
were instead decremented by 0.0001
, the 40 decrements would only reduce the final alpha
to 0.021
– whereas the proper practice for this style of SGD (Stochastic Gradient Descent) with linear learning-rate decay is for the value to end "very close to 0.000
"). If users start tinkering with max_epochs
– it is, after all, a parameter pulled out on top! – but don't also adjust the decrement every time, they are likely to far-undershoot or far-overshoot 0.000
.
所以不要使用这种模式.
So don't use this pattern.
不幸的是,许多错误的在线示例已经相互复制了此反模式,和在自己的epochs
和alpha
处理中犯了严重错误.请不要复制他们的错误,并让他们的作者知道无论出现此问题的地方,他们都在误导人们.
Unfortunately, many bad online examples have copied this anti-pattern from each other, and make serious errors in their own epochs
and alpha
handling. Please don't copy their error, and please let their authors know they're misleading people wherever this problem appears.
可以通过更简单的替换来改进上面的代码:
The above code can be improved with the much-simpler replacement:
max_epochs = 40
model = Doc2Vec() # of course, if non-default parameters needed, use them here
# but most users won't need to change alpha/min_alpha at all
model.build_vocab(tagged_data)
model.train(tagged_data, total_examples=model.corpus_count, epochs=max_epochs)
model.save("d2v.model")
在这里,.train()
方法将精确执行epochs
的请求数量,从而将内部有效alpha
从其默认起始值平稳地减小到接近零. (很少需要更改开始的alpha
,但是即使您愿意,只需在初始模型创建时设置一个新的非默认值就足够了.)
Here, the .train()
method will do exactly the requested number of epochs
, smoothly reducing the internal effective alpha
from its default starting value to near-zero. (It's rare to need to change the starting alpha
, but even if you wanted to, just setting a new non-default value at initial model-creation is enough.)
这篇关于经过多次培训之后,我的Doc2Vec代码没有得到很好的结果.可能是什么问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!