如何从检查点加载图层 [英] How to load a layer from checkpoint

查看:29
本文介绍了如何从检查点加载图层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个配置:

network = {"source_embed_raw": {"class": "linear", ...}}

我想从某个现有检查点加载 source_embed_raw 层的参数.在该检查点中,param 的调用方式不同 (output/rec/target_embed_raw/W).

I want to load the params for layer source_embed_raw from some existing checkpoint. In that checkpoint, param is called differently (output/rec/target_embed_raw/W).

我明白,我可以使用 preload_from_files 加载参数,但我不确定在我的情况下这样做的确切方法,因为层的名称不同,因此只需添加一个前缀不做这项工作.

I understand, that I can load parameters with preload_from_files, but I am not sure about the exact way to do that in my case, because the names of the layers differ, thus simply adding a prefix does not do the job.

推荐答案

preload_from_files 目前无法以这种方式实现.所以我目前看到这些可能的选项:

This is currently not possible with preload_from_files in this way. So I currently see these possible options:

  1. 我们可以扩展preload_from_files(和CustomCheckpointLoader)的逻辑以允许诸如此类(一些通用变量/层名称映射).

  1. We could extend the logic of preload_from_files (and CustomCheckpointLoader) to allow for sth like that (some generic variable/layer name mapping).

或者您可以将图层从 source_embed_raw 重命名为例如old_model__target_embed_raw 然后使用 preload_from_filesprefix 选项.如果你不想重命名,你仍然可以添加一个像old_model__target_embed_raw这样的层,然后在source_embed_raw中使用参数共享.

Or you could rename your layer from source_embed_raw to e.g. old_model__target_embed_raw and then use preload_from_files with the prefix option. If you do not want to rename it, you could still add a layer like old_model__target_embed_raw and then use parameter sharing in source_embed_raw.

如果检查点中的参数实际上被称为output/rec/target_embed_raw/...,您可以创建一个名为old_model__outputSubnetworkLayercode>,在另一个名为 recSubnetworkLayer,以及名为 target_embed_raw 的层.

If the parameter in the checkpoint is actually called sth like output/rec/target_embed_raw/..., you could create a SubnetworkLayer named old_model__output, in that another SubnetworkLayer with name rec, and in that a layer named target_embed_raw.

您可以编写一个脚本来简单地加载现有的检查点,并将存储作为一个新的检查点,但具有重命名的变量名称(这也完全独立于 RETURNN).

You could write a script to simply load the existing checkpoint, and store is as a new checkpoint but with renamed variable names (this is also totally independent from RETURNN).

LinearLayer(和大多数其他层)允许准确指定参数的初始化方式(forward_weights_initbias_init).参数初始化相当灵活.例如.可以使用诸如 load_txt_file_initializer 之类的东西.目前没有这样的函数可以直接从现有的检查点加载它,但我们可以添加它.或者你可以简单地在你的配置中实现逻辑(它只会像 5 行左右的代码).

LinearLayer (and most other layers) allows to specify exactly how the parameters are initialized (forward_weights_init and bias_init). The parameter initialization is quite flexible. E.g. there is sth like load_txt_file_initializer which can be used. Currently there is no such function to directly load it from an existing checkpoint but we could add that. Or you could simply implement the logic inside your config (it will only be sth like 5 lines of code or so).

除了使用 preload_from_files,您还可以使用 SubnetworkLayerload_on_init 选项.然后是与选项 2 中类似的逻辑.

Instead of using preload_from_files, you could also use SubnetworkLayer and the load_on_init option. And then a similar logic as in option 2.

这篇关于如何从检查点加载图层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆