如何用特定的权重初始化模型? [英] How to initialize the model with certain weights?

查看:38
本文介绍了如何用特定的权重初始化模型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用示例stateful_clients";在张量流联合示例中.我想使用我的预训练模型权重来初始化模型.我使用函数 model.load_weights(init_weight).但它似乎不起作用.第一轮的验证准确率仍然很低.我该如何解决问题?

I am using the example "stateful_clients" in tensorflow-federated examples. I want to use my pretrained model weights to initialize the model. I use the function model.load_weights(init_weight). But it seems that it doesn't work. The validation accuracy in the first round is still low. How can I solve the problem?

def tff_model_fn():
    """Constructs a fully initialized model for use in federated averaging."""
    keras_model = get_five_layers_cnn([28, 28, 1])
    keras_model.load_weights(init_weight)
    loss = tf.keras.losses.SparseCategoricalCrossentropy()
    return stateful_fedavg_tf.KerasModelWrapper(keras_model,
                                                test_data.element_spec, loss)

推荐答案

TFF 中状态和模型权重的快速入门

TFF 对机器学习中的状态有独特的看法,这通常是它希望实现纯函数式的结果.

A quick primer on state and model weights in TFF

TFF takes a distinct perspective on state in machine learning, generally a consequence of its desire to be purely functional.

通常在机器学习中,模型在概念上是一个函数,它获取数据并产生预测.然而,这个概念有时有点过头了.模型"是指一个训练模型(符合上述规范),还是指由其参数参数化的架构,因此需要接受这些参数作为参数才能被视为真正的函数"'?中间的一个概念是有状态的函数",我认为人们在使用术语模型"时往往会指代它.

Usually in machine learning, a model is conceptually a function which takes data and produces a prediction. However, this notion is a little overloaded at times; does 'model' refer to a trained model (fitting the specification above), or an architecture which is parameterized by its parameters, and therefore needs to accept these parameters as an argument to be considered truly a 'function'? A conception somewhat in the middle is that of a 'stateful function', which I think tends to be what people intend to refer to when they use the term 'model'.

TFF 标准化后一种理解.对于 TFF,模型"是一个函数,它接受参数以及作为参数的数据,从而产生预测.这通常是为了避免有状态函数的概念,从纯函数的角度来看,这是不允许的(f(x) == f(x) 应该始终为真,所以 f 不能有任何影响其输出的状态).

TFF standardizes on the latter understanding. For TFF, a 'model' is a function which accepts parameters along with data as an argument, producing a prediction. This is generally to avoid the notion of a stateful function, which is disallowed by a purely functional perspective (f(x) == f(x) should always be true, so f cannot have any state which affects its output).

我对 TFF 代码库的这一部分不是很熟悉;特别是我对 keras 模型包装器的行为感到有些惊讶,因为通常 TFF 想要尽快将所有逻辑序列化为 TFF 定义的数据结构(至少,这是我的想法).看一眼代码,在我看来它可以工作——但过去 TFF 和 Keras 之间有过令人兴奋的交互.

I'm not super familiar with this portion of the TFF codebase; in particular I'm a little surprised at the behavior of the keras model wrapper, as usually TFF wants to serialize all logic into TFF-defined data structures as soon as possible (at least, this is how I think about it). Glancing at the code, it looks to me like it could work--but there have been exciting interactions between TFF and Keras in the past.

简而言之,这条路径应该应该如何工作:

Briefly, here is how this path should be working:

  1. 您在上面定义的模型函数被调用,而 图上下文中构建初始化计算;加载权重的逻辑(或权重本身的分配,作为常量烘焙到图中)有望被序列化到 TFF 生成的图中以表示 initialize.
  2. 调用 iterative_process.initialize 后,您会发现所需的权重填充在返回数据结构的适当属性中.这将作为您迭代过程的初始起点,然后您就可以开始比赛了.
  1. The model function you define above is invoked while building the initialize computation, in a graph context; the logic to load weights (or assignment of the weights themselves, baked into the graph as a constant) would hopefully be serialized into the graph that TFF generates to represent initialize.
  2. Upon calling iterative_process.initialize, you would find your desired weights populated in the appropriate attributes of the returned data structure. This would serve as your initial starting point for your iterative process, and you would be off to the races.

我在上面怀疑的是 1. TFF 会在 TensorFlow 图形上下文中静默调用您的 model_fn,从而导致非程序顺序语义;如果赋值和函数的返回值之间没有控制依赖(上面的代码中没有,实际上如何强制这样做并不明显),则可能会在初始化时跳过赋值.因此,从 initialize 返回的状态 不会 具有您指定的权重.

What I am suspicious of in the above is 1. TFF will silently invoke your model_fn in a TensorFlow graph context, resulting in non program-order semantics; if there is no control dependency between the assignment and the return value of your function (which there isn't in the code above, and in fact it is not obvious how to force this), the assignment may be skipped at initialize time. Therefore the state returned from initialize won't have your specified weights.

如果这种怀疑是真的,那么合适的解决方案是运行它以直接在 Python 中运行权重加载逻辑.TFF 提供了一些实用程序来帮助处理此类事情,例如 tff.learning.state_with_new_model_weights.这将用作:

If this suspicion is true, the appropriate solution is to run this to run the weight loading logic directly in Python. TFF provides some utilities to help with this kind of thing, like tff.learning.state_with_new_model_weights. This would be used like:

state = iterative_process.initialize()
weights = tf.keras.load_weights(...)  # No idea if this call is correct, probably not.
state_with_loaded_weights = tff.learning.state_with_new_model_weights(state, weights)
...
# continue on using state in the iterative process

这篇关于如何用特定的权重初始化模型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆