您可以使用特定于任务的架构从头开始训练 BERT 模型吗? [英] Can you train a BERT model from scratch with task specific architecture?

查看:39
本文介绍了您可以使用特定于任务的架构从头开始训练 BERT 模型吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基础模型的 BERT 预训练是通过语言建模方法完成的,我们屏蔽了句子中特定百分比的标记,并使模型学习那些缺失的掩码.然后,我想为了做下游任务,我们添加了一个新初始化的层,我们对模型进行了微调.

BERT pre-training of the base-model is done by a language modeling approach, where we mask certain percent of tokens in a sentence, and we make the model learn those missing mask. Then, I think in order to do downstream tasks, we add a newly initialized layer and we fine-tune the model.

然而,假设我们有一个巨大的句子分类数据集.理论上,我们是否可以从头开始初始化 BERT 基础架构,仅用这个句子分类数据集训练额外的下游任务特定层 + 基础模型权重形成scratch,并且仍然取得良好的结果?

However, suppose we have a gigantic dataset for sentence classification. Theoretically, can we initialize the BERT base architecture from scratch, train both the additional downstream task specific layer + the base model weights form scratch with this sentence classification dataset only, and still achieve a good result?

谢谢.

推荐答案

BERT 可以被视为一种语言编码器,它在大量数据上进行训练以很好地学习语言.众所周知,最初的 BERT 模型是在整个英文维基百科和图书语料库上训练的,总和为 3,300M 个词.BERT-base 有 109M 的模型参数.所以,如果你认为你有足够大的数据来训练 BERT,那么你的问题的答案是肯定的.

BERT can be viewed as a language encoder, which is trained on a humongous amount of data to learn the language well. As we know, the original BERT model was trained on the entire English Wikipedia and Book corpus, which sums to 3,300M words. BERT-base has 109M model parameters. So, if you think you have large enough data to train BERT, then the answer to your question is yes.

但是,当您说仍然取得了不错的结果"时,我假设您是在与原始 BERT 模型进行比较.在这种情况下,答案在于训练数据的大小.

However, when you said "still achieve a good result", I assume you are comparing against the original BERT model. In that case, the answer lies in the size of the training data.

我想知道为什么你更喜欢从头开始训练 BERT 而不是微调它?是因为害怕域名适配问题吗?如果没有,预训练的 BERT 可能是一个更好的起点.

I am wondering why do you prefer to train BERT from scratch instead of fine-tuning it? Is it because you are afraid of the domain adaptation issue? If not, pre-trained BERT is perhaps a better starting point.

请注意,如果您想从头开始训练 BERT,您可以考虑更小的架构.您可能会发现以下论文很有用.

Please note, if you want to train BERT from scratch, you may consider a smaller architecture. You may find the following papers useful.

这篇关于您可以使用特定于任务的架构从头开始训练 BERT 模型吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆