在 TensorFlow Federated 中创建自定义联合数据集 [英] Create a custom federated data set in TensorFlow Federated

查看:85
本文介绍了在 TensorFlow Federated 中创建自定义联合数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想改编这篇博客文章中的循环自动编码器 在联合环境中工作.

I'd like to adapt the recurrent autoencoder from this blog post to work in a federated environment.

我稍微修改了模型以符合 TFF 图像分类中显示的示例教程.

I've modified the model slightly to conform with the example shown in the TFF image classification tutorial.

def create_compiled_keras_model():
  model = tf.keras.models.Sequential([
      tf.keras.layers.LSTM(2, input_shape=(10, 2), name='Encoder'),
      tf.keras.layers.RepeatVector(10, name='Latent'),
      tf.keras.layers.LSTM(2, return_sequences=True, name='Decoder')]
  )

  model.compile(loss='mse', optimizer='adam')
  return model

model = create_compiled_keras_model()

sample_batch = gen(1)
timesteps, input_dim = 10, 2

def model_fn():
  keras_model = create_compiled_keras_model()
  return tff.learning.from_compiled_keras_model(keras_model, sample_batch)

gen 函数定义如下:

The gen function is defined as follows:

import random

def gen(batch_size):
    seq_length = 10

    batch_x = []
    batch_y = []

    for _ in range(batch_size):
        rand = random.random() * 2 * np.pi

        sig1 = np.sin(np.linspace(0.0 * np.pi + rand, 3.0 * np.pi + rand, seq_length * 2))
        sig2 = np.cos(np.linspace(0.0 * np.pi + rand, 3.0 * np.pi + rand, seq_length * 2))

        x1 = sig1[:seq_length]
        y1 = sig1[seq_length:]
        x2 = sig2[:seq_length]
        y2 = sig2[seq_length:]

        x_ = np.array([x1, x2])
        y_ = np.array([y1, y2])
        x_, y_ = x_.T, y_.T

        batch_x.append(x_)
        batch_y.append(y_)

    batch_x = np.array(batch_x)
    batch_y = np.array(batch_y)

    return batch_x, batch_x #batch_y

到目前为止,我一直无法找到任何不使用 TFF 存储库中的示例数据的文档.

So far I've been unable to find any documentation which does not use sample data from the TFF repository.

如何修改它以创建联合数据集并开始训练?

How can I modify this to create a federated data set and begin training?

推荐答案

在非常高的层次上,要使用具有 TFF 的任意数据集,需要执行以下步骤:

At a very high-level, to use an arbitrary dataset with TFF the following steps are needed:

  1. 将数据集划分为每个客户端子集(如何做到这一点是一个更大的问题)
  2. 创建一个 tf.data.每个客户端子集的数据集
  3. 将所有(或部分)Dataset 对象的列表传递给联合优化.
  1. Partition the dataset into per client subsets (how to do so is a much larger question)
  2. Create a tf.data.Dataset per client subset
  3. Pass a list of all (or a subset) of the Dataset objects to the federated optimization.

教程中发生了什么

联邦学习图像分类教程使用tff.learning.build_federated_averaging_process 使用 FedAvg 算法建立联邦优化.

What is happening in the tutorial

The Federated Learning for Image Classification tutorial uses tff.learning.build_federated_averaging_process to build up a federated optimization using the FedAvg algorithm.

在该笔记本中,以下代码正在执行一轮联合优化,其中将客户端数据集传递给进程的 .next 方法:

In that notebook, the following code is executing one round of federated optimization, where the client datasets are passed to the process' .next method:

   state, metrics = iterative_process.next(state, federated_train_data)

这里的 federated_train_datatf.data.Dataset 的 Python list,每个参与回合的客户端一个.

Here federated_train_data is a Python list of tf.data.Dataset, one per client participating in the round.

TFF 提供的罐装数据集(位于 tff.simulation.datasets) 使用 tff.simulation.ClientData 实现接口,管理客户端→数据集映射和tff.data.Dataset创建.

The canned datasets provided by TFF (under tff.simulation.datasets) are implemented using the tff.simulation.ClientData interface, which manages the client → dataset mapping and tff.data.Dataset creation.

如果您打算重复使用数据集,将其实现为 tff.simulation.ClientData 可能会使将来的使用更容易.

If you're planning to re-use a dataset, implementing it as a tff.simulation.ClientData may make future use easier.

这篇关于在 TensorFlow Federated 中创建自定义联合数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆