合并Word2Vec中的预训练模型? [英] Merging pretrained models in Word2Vec?

查看:101
本文介绍了合并Word2Vec中的预训练模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经下载了1000亿个单词的Google新闻预训练矢量文件.最重要的是,我还在训练自己的3gb数据,产生另一个预训练的矢量文件.两者都具有300个特征尺寸和超过1gb的尺寸.

I have download 100 billion word Google news pretrained vector file. On top of that i am also training my own 3gb data producing another pretrained vector file. Both has 300 feature dimensions and more than 1gb size.

如何合并这两个庞大的预训练向量?或如何训练新模型并在另一个模型上更新向量?我看到基于C的word2vec不支持批量训练.

How do i merge these two huge pre-trained vectors? or how do i train a new model and update vectors on top of another? I see that C based word2vec does not support batch training.

我正在寻找从这两个模型计算单词类比的方法.我相信从这两个来源学到的向量将产生很好的结果.

I am looking to compute word analogy from these two models. I believe that vectors learned from these two sources will produce pretty good results.

推荐答案

没有简单的方法可以合并单独培训课程的最终结果.

There's no straightforward way to merge the end-results of separate training sessions.

即使对于完全相同的数据,来自初始种子或线程调度抖动的轻微随机化也会导致最终状态的变化,从而使向量仅在同一会话中具有完全可比性.

Even for the exact same data, slight randomization from initial seeding or thread scheduling jitter will result in diverse end states, making vectors only fully comparable within the same session.

这是因为每个会话都会找到 a 有用的向量配置...但是有许多同样有用的配置,而不是单个最佳配置.

This is because every session finds a useful configuration of vectors... but there are many equally useful configurations, rather than a single best.

例如,无论您到达的最终状态如何,都有许多旋转/反射,它们在训练预测任务上可以表现得一样好,或者在某些其他任务上表现也一样好(例如类比求解).但是,这些可能的替代方案中的大多数都不会具有可以混合匹配的坐标,以进行相互比较.

For example, whatever final state you reach has many rotations/reflections that can be exactly as good on the training prediction task, or perform exactly as well on some other task (like analogies-solving). But most of these possible alternatives will not have coordinates that can be mixed-and-matched for useful comparisons against each other.

用以前的训练运行的数据预加载模型可能在使用新数据进行更多的训练之后会改善结果,但是我不知道对这种可能性进行过任何严格的测试.效果可能取决于您的特定目标,您的参数选择以及新数据和旧数据的相似程度,或代表将使用矢量的最终数据的代表.

Preloading your model with data from prior training runs might improve the results after more training with new data, but I'm not aware of any rigorous testing of this possibility. The effect likely depends on your specific goals, your parameter choices, and how much the new and old data are similar, or representative of the eventual data against which the vectors will be used.

例如,如果Google新闻语料库与您自己的训练数据不同,或者您将使用单词向量来理解文本,那么将其用作起点可能会减慢或使您的训练偏颇.另一方面,如果您对新数据进行足够长时间的训练,最终对预加载值的任何影响都可能被稀释为零. (如果您确实想要混合"结果,则可能必须同时使用交错目标对新数据进行训练,以将向量向后推回到先前数据集的值.)

For example, if the Google News corpus is unlike your own training data, or the text you'll be using the word-vectors to understand, using it as a starting point might just slow or bias your training. On the other hand, if you train on your new data long enough, eventually any influence of the preloaded values could be diluted to nothingness. (If you really wanted a 'blended' result, you might have to simultaneously train on the new data with an interleaved goal for nudging the vectors back towards the prior-dataset values.)

结合独立会议的结果的方式可能会成为一个好的研究项目.也许word2vec语言翻译项目中使用的方法(学习词汇空间之间的投影)也可以在不同运行的不同坐标之间转换".也许将某些向量锁定在适当的位置,或者在预测新文本"和保持接近旧向量"的双重目标上进行培训,将会显着改善组合结果.

Ways to combine the results from independent sessions might make a good research project. Maybe the method used in the word2vec language-translation projects – learning a projection between vocabulary spaces – could also 'translate' between the different coordinates of different runs. Maybe locking some vectors in place, or training on the dual goals of 'predict the new text' and 'stay close to the old vectors' would give meaningfully improved combined results.

这篇关于合并Word2Vec中的预训练模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆