如何预下载变压器模型 [英] How to predownload a transformers model

查看：118 发布时间：2021/5/8 19:20:55 machine-learning flask amazon-elastic-beanstalk transformer huggingface-transformers

本文介绍了如何预下载变压器模型的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在flask应用程序中执行文本生成任务，并将其托管在Web服务器上，但是在下载GPT模型时，由于下载时间和内存过多，弹性beantalk管理的EC2实例崩溃了

I want to perform a text generation task in a flask app and host it on a web server however when downloading the GPT models the elastic beanstalk managed EC2 instance crashes because the download takes too much time and memory

from transformers.tokenization_openai import OpenAIGPTTokenizer
from transformers.modeling_tf_openai import TFOpenAIGPTLMHeadModel
model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")

这些是引起问题的相关行.GPT约为445 MB.我正在使用变压器库.而不是在这一行下载模型，我想知道是否可以腌制该模型，然后将其捆绑为存储库的一部分.这个库有可能吗?否则，我如何预加载该模型以避免出现问题?

These are the lines in question causing the issue. GPT is approx 445 MB. I am using the transformers library. Instead of downloading the model at this line I was wondering if I could pickle the model and then bundle it as part of the repository. Is that possible with this library? Otherwise how can I preload this model to avoid the issues I am having?

推荐答案

方法1:

从此链接下载模型:

pytorch模型: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-pytorch_model.bin

pytorch-model: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-pytorch_model.bin

tensorflow模型: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-tf_model.h5

tensorflow-model: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-tf_model.h5

配置文件: https://s3.amazonaws.com/models.huggingface.co/bert/openai-gpt-config.json

来源: https://huggingface.co/transformers/_modules/transformers/configuration_openai.html#OpenAIGPTConfig

您可以手动下载模型(在您的情况下为TensorFlow模型 .h5 和 config.json 文件)，并将其放在文件夹中(例如模型).(您可以尝试压缩模型，然后在需要时将其解压缩到ec2实例中)

You can manually download the model (in your case TensorFlow model .h5 and the config.json file), put it in a folder (let's say model) in the repository. (you can try compressing the model, and then decompressing once it's in the ec2 instance if needed)

然后，您可以从路径 而不是下载 (包含 .h5 和 config.json ):

Then, you can directly load the model in your web server from the path instead of downloading (model folder which contains the .h5 and config.json):

model = TFOpenAIGPTLMHeadModel.from_pretrained("model") 
# model folder contains .h5 and config.json
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt") 
# this is a light download

方法2:

您可以使用常规方法在本地计算机上下载模型，而不必使用链接进行下载.

Instead of using links to download, you can download the model in your local machine using the conventional method.

from transformers.tokenization_openai import OpenAIGPTTokenizer
from transformers.modeling_tf_openai import TFOpenAIGPTLMHeadModel
model = TFOpenAIGPTLMHeadModel.from_pretrained("openai-gpt")
tokenizer = OpenAIGPTTokenizer.from_pretrained("openai-gpt")

这将下载模型.现在，您可以使用 save_pretrained 函数将权重保存在文件夹中.

This downloads the model. Now you can save the weights in a folder using save_pretrained function.

model.save_pretrained('/content/')#保存在内容文件夹中

现在，内容文件夹应包含.h5文件和config.json.

Now, the content folder should contain a .h5 file and a config.json.

只需将它们上传到存储库并从中加载.

Just upload them to the repository and load from that.

这篇关于如何预下载变压器模型的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何预下载变压器模型 [英] How to predownload a transformers model

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何预下载变压器模型 [英] How to predownload a transformers model

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭