我如何将 tensorboard 与 aws sagemaker tensorflow 一起使用? [英] how can i use tensorboard with aws sagemaker tensorflow?

查看：21 发布时间：2021/11/27 10:54:41 amazon-web-services tensorflow2.0 tensorboard amazon-sagemaker

本文介绍了我如何将 tensorboard 与 aws sagemaker tensorflow 一起使用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经开始了一个贤者工作:

i have started a sagemaker job:

from sagemaker.tensorflow import TensorFlow
mytraining= TensorFlow(entry_point='model.py',
                        role=role,
                        train_instance_count=1,
                        train_instance_type='ml.p2.xlarge',
                        framework_version='2.0.0',
                        py_version='py3',
                        distributions={'parameter_server'{'enabled':False}})

training_data_uri ='s3://path/to/my/data'
mytraining.fit(training_data_uri,run_tensorboard_locally=True)

使用 run_tesorboard_locally=True 给了我

Tensorboard is not supported with script mode. You can run the following command: tensorboard --logdir None --host localhost --port 6006 This can be run from anywhere with access to the S3 URI used as the logdir.

好像我不能使用它的脚本模式，但我可以在 s3 中访问 tensorboard 的日志?但是s3中的日志在哪里?

It seems like i cant use it script mode, but I can access the logs of tensorboard in s3? But where are the logs in s3?

def _parse_args():
    parser = argparse.ArgumentParser()

    # Data, model, and output directories
    # model_dir is always passed in from SageMaker. By default this is a S3 path under the default bucket.
    parser.add_argument('--model_dir', type=str)
    parser.add_argument('--sm-model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAINING'))
    parser.add_argument('--hosts', type=list, default=json.loads(os.environ.get('SM_HOSTS')))
    parser.add_argument('--current-host', type=str, default=os.environ.get('SM_CURRENT_HOST'))

    return parser.parse_known_args()

if __name__ == "__main__":
    args, unknown = _parse_args()

    train_data, train_labels = load_training_data(args.train)
    eval_data, eval_labels = load_testing_data(args.train)

    mymodel= model(train_data, train_labels, eval_data, eval_labels)

    if args.current_host == args.hosts[0]:
        mymodel.save(os.path.join(args.sm_model_dir, '000000002/model.h5'))

类似的问题在这里:stack

编辑我尝试了这个新配置，但它不起作用.

EDIT i tried this new config but it doesnt work.

 tensorboard_output_config = TensorBoardOutputConfig( s3_output_path='s3://PATH/to/my/bucket')

mytraining= TensorFlow(entry_point='model.py',
                        role=role,
                        train_instance_count=1,
                        train_instance_type='ml.p2.xlarge',
                        framework_version='2.0.0',
                        py_version='py3',
                        distributions={'parameter_server': {'enabled':False}},
                        tensorboard_output_config=tensorboard_output_config)

我在我的 model.py 脚本中添加了回调，这实际上是我在没有 sagemaker 的情况下使用的.作为 logdir，我定义了默认目录，TensoboardOutputConfig 在其中写入数据......但它不起作用.docs 我也使用它而没有回调.

i added the callback in my model.py script that is actually what i use without sagemaker. As logdir i defined the default dir, where the TensoboardOutputConfig writes the data.. but it doesnt work. docs I also used it without the callback.

 tensorboardCallback = tf.keras.callbacks.TensorBoard(
        log_dir='/opt/ml/output/tensorboard',
        histogram_freq=0,
        # batch_size=32,ignored tf.2.0
        write_graph=True,
        write_grads=False,
        write_images=False,
        embeddings_freq=0,
        embeddings_layer_names=None,
        embeddings_metadata=None,
        embeddings_data=None,
        update_freq='batch')

推荐答案

难以调试您的情况的确切根本原因，但以下步骤对我有用.我在笔记本实例中手动启动了 tensorboard.

Difficult to debug what the exact root cause is in your case, but following steps worked for me. I started tensorboard inside the notebook instance manually.

遵循关于sagemaker 调试为张量板日志配置 S3 输出路径.

Followed guide on sagemaker debugging to configure the S3 output path for tensorboard logs.

from sagemaker.debugger import TensorBoardOutputConfig

tensorboard_output_config = TensorBoardOutputConfig(
       s3_output_path = 's3://bucket-name/tensorboard_log_folder/'
)

estimator = TensorFlow(entry_point='train.py',
               source_dir='./',
               model_dir=model_dir,
               output_path= output_dir,
               train_instance_type=train_instance_type,
               train_instance_count=1,
               hyperparameters=hyperparameters,
               role=sagemaker.get_execution_role(),
               base_job_name='Testing-TrainingJob',
               framework_version='2.2',
               py_version='py37',
               script_mode=True,
               tensorboard_output_config=tensorboard_output_config)

estimator.fit(inputs)

通过笔记本实例上的终端使用上面提供的 S3 位置启动张量板.

$ tensorboard --logdir 's3://bucket-name/tensorboard_log_folder/'

通过带有 /proxy/6006/ 的 URL 访问板.您需要更新以下 URL 中的笔记本实例详细信息.

Access the board via URL with /proxy/6006/. You need to update the notebook instance details in the following URL.

https://myinstance.notebook.us-east-1.sagemaker.aws/proxy/6006/

这篇关于我如何将 tensorboard 与 aws sagemaker tensorflow 一起使用?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我如何将 tensorboard 与 aws sagemaker tensorflow 一起使用? [英] how can i use tensorboard with aws sagemaker tensorflow?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我如何将 tensorboard 与 aws sagemaker tensorflow 一起使用? [英] how can i use tensorboard with aws sagemaker tensorflow?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭