如何使用 Tensorboard 在同一图上绘制不同的汇总指标? [英] How to plot different summary metrics on the same plot with Tensorboard?
问题描述
我希望能够绘制每批次的训练损失和平均验证损失在同一图上的验证集张量板.当我的验证集太大而无法放入内存时,我遇到了这个问题,因此需要批处理和使用 tf.metrics
更新操作.
I would like to be able plot the training loss per batch and the average validation loss for the validation set on the same plot in Tensorboard. I ran into this issue when my validation set was too large to fit into memory so required batching and the use of tf.metrics
update ops.
此问题适用于您希望在 Tensorboard 中的同一图表上显示的任何 Tensorflow 指标.
This question could apply to any Tensorflow metrics you wanted to appear on the same graph in Tensorboard.
我可以
- 分别绘制这两个图(参见此处)
- 在与 training-loss-per-training-batch 相同的图表上绘制 validation-loss-per-validation-batchset 可以是单个批次,我可以重复使用下面的训练摘要操作
train_summ
)
- plot these two graphs separately (see here)
- plot the validation-loss-per-validation-batch on the same graph as the training-loss-per-training-batch (this was OK when the validation set could be a single batch and I could reuse the training summary op
train_summ
below)
在下面的示例代码中,我的问题源于这样一个事实,即我的验证摘要 tf.summary.scalar
与 name=loss
被重命名为 loss_1
并因此移动到 Tensorboard 中的单独图形.据我所知,Tensorboard 采用 同名" 并将它们绘制在同一个图形上,而不管它们在哪个文件夹中.这是令人沮丧的 train_summ
(名称=loss) 只写入 train
文件夹,valid_summ
(name=loss) 只写入 valid
文件夹 - 但仍然是重命名为 loss_1
.
In the example code below, my issue stems from the fact that my validation summary tf.summary.scalar
with name=loss
gets renamed to loss_1
and thus is moved to a separate graph in Tensorboard. From what I can work out Tensorboard takes "same name" and plots them on the same graph, regardless of what folder they are in. This is frustrating as train_summ
(name=loss) is only ever written to the train
folder and valid_summ
(name=loss) is only ever written to the valid
folder - but is still renamed to loss_1
.
示例代码:
# View graphs with (Linux): $ tensorboard --logdir=/tmp/my_tf_model
import tensorflow as tf
import numpy as np
import os
import tempfile
def train_data_gen():
yield np.random.normal(size=[3]), np.array([0.5, 0.5, 0.5])
def valid_data_gen():
yield np.random.normal(size=[3]), np.array([0.8, 0.8, 0.8])
batch_size = 25
n_training_batches = 4
n_valid_batches = 2
n_epochs = 5
summary_loc = os.path.join(tempfile.gettempdir(), 'my_tf_model')
print("Summaries written to" + summary_loc)
# Dummy data
train_data = tf.data.Dataset.from_generator(train_data_gen, (tf.float32, tf.float32)).repeat().batch(batch_size)
valid_data = tf.data.Dataset.from_generator(valid_data_gen, (tf.float32, tf.float32)).repeat().batch(batch_size)
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle,
train_data.output_types, train_data.output_shapes)
batch_x, batch_y = iterator.get_next()
train_iter = train_data.make_initializable_iterator()
valid_iter = valid_data.make_initializable_iterator()
# Some ops on the data
loss = tf.losses.mean_squared_error(batch_x, batch_y)
valid_loss, valid_loss_update = tf.metrics.mean(loss)
# Write to summaries
train_summ = tf.summary.scalar('loss', loss)
valid_summ = tf.summary.scalar('loss', valid_loss) # <- will be renamed to "loss_1"
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_handle, valid_handle = sess.run([train_iter.string_handle(), valid_iter.string_handle()])
sess.run([train_iter.initializer, valid_iter.initializer])
# Summary writers
writer_train = tf.summary.FileWriter(os.path.join(summary_loc, 'train'), sess.graph)
writer_valid = tf.summary.FileWriter(os.path.join(summary_loc, 'valid'), sess.graph)
global_step = 0 # implicit as no actual training
for i in range(n_epochs):
# "Training"
for j in range(n_training_batches):
global_step += 1
summ = sess.run(train_summ, feed_dict={handle: train_handle})
writer_train.add_summary(summary=summ, global_step=global_step)
# "Validation"
sess.run(tf.local_variables_initializer())
for j in range(n_valid_batches):
_, batch_summ = sess.run([valid_loss_update, train_summ], feed_dict={handle: valid_handle})
# The following will plot the batch loss for the validation set on the loss plot with the training data:
# writer_valid.add_summary(summary=batch_summ, global_step=global_step + j + 1)
summ = sess.run(valid_summ)
writer_valid.add_summary(summary=summ, global_step=global_step) # <- I want this on the training loss graph
我尝试了什么
- 按照 这个问题 和 这个问题(想想我在那个问题的评论中提到了什么)
- 使用
tf.summary.merge
将我所有的训练和验证/测试指标合并到整体摘要操作中;做有用的簿记,但没有在同一张图上绘制我想要的 - 使用
tf.summary.scalar
family
属性(loss
仍重命名为loss_1
) - (完整的 hack 解决方案) 对 training 数据使用
valid_loss, valid_loss_update = tf.metrics.mean(loss)
然后运行 tf.local_variables_initializer()
每个训练批次.这确实为您提供了相同的摘要操作,从而将事情放在同一个图表上,但肯定不是您有意这样做的方式?它也不能推广到其他指标. - Separate
tf.summary.FileWriter
objects (one for training, one for validation), as recommended by this issue and this question (think what I'm after is alluded to in the comment of that question) - The use of
tf.summary.merge
to merge all my training and validation/test metrics into overall summary ops; does useful book-keeping but doesn't plot what I want on the same graph - Use of the
tf.summary.scalar
family
attribute (loss
still gets renamed toloss_1
) - (Complete hack solution) Use
valid_loss, valid_loss_update = tf.metrics.mean(loss)
on the training data and then runtf.local_variables_initializer()
every training batch. This does give you the same summary op and thus puts things on the same graph but is surely not how you're meant to do this? It also doesn't generalise to other metrics. - TensorFlow 1.9.0
- 张量板 1.9.0
- Python 3.5.2
What I have tried
推荐答案
Tensorboard custom_scalar
plugin 是解决这个问题的方法.
The Tensorboard custom_scalar
plugin is the way to solve this problem.
这里是同样的例子,使用 custom_scalar
在同一图上绘制两个损失(每个训练批次 + 所有验证批次的平均值):
Here's the same example again with a custom_scalar
to plot the two losses (per training batch + averaged over all validation batches) on the same plot:
# View graphs with (Linux): $ tensorboard --logdir=/tmp/my_tf_model
import os
import tempfile
import tensorflow as tf
import numpy as np
from tensorboard import summary as summary_lib
from tensorboard.plugins.custom_scalar import layout_pb2
def train_data_gen():
yield np.random.normal(size=[3]), np.array([0.5, 0.5, 0.5])
def valid_data_gen():
yield np.random.normal(size=[3]), np.array([0.8, 0.8, 0.8])
batch_size = 25
n_training_batches = 4
n_valid_batches = 2
n_epochs = 5
summary_loc = os.path.join(tempfile.gettempdir(), 'my_tf_model')
print("Summaries written to " + summary_loc)
# Dummy data
train_data = tf.data.Dataset.from_generator(
train_data_gen, (tf.float32, tf.float32)).repeat().batch(batch_size)
valid_data = tf.data.Dataset.from_generator(
valid_data_gen, (tf.float32, tf.float32)).repeat().batch(batch_size)
handle = tf.placeholder(tf.string, shape=[])
iterator = tf.data.Iterator.from_string_handle(handle, train_data.output_types,
train_data.output_shapes)
batch_x, batch_y = iterator.get_next()
train_iter = train_data.make_initializable_iterator()
valid_iter = valid_data.make_initializable_iterator()
# Some ops on the data
loss = tf.losses.mean_squared_error(batch_x, batch_y)
valid_loss, valid_loss_update = tf.metrics.mean(loss)
with tf.name_scope('loss'):
train_summ = summary_lib.scalar('training', loss)
valid_summ = summary_lib.scalar('valid', valid_loss)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
train_handle, valid_handle = sess.run([train_iter.string_handle(), valid_iter.string_handle()])
sess.run([train_iter.initializer, valid_iter.initializer])
writer_train = tf.summary.FileWriter(os.path.join(summary_loc, 'train'), sess.graph)
writer_valid = tf.summary.FileWriter(os.path.join(summary_loc, 'valid'), sess.graph)
layout_summary = summary_lib.custom_scalar_pb(
layout_pb2.Layout(category=[
layout_pb2.Category(
title='losses',
chart=[
layout_pb2.Chart(
title='losses',
multiline=layout_pb2.MultilineChartContent(tag=[
'loss/training', 'loss/valid'
]))
])
]))
writer_train.add_summary(layout_summary)
global_step = 0
for i in range(n_epochs):
for j in range(n_training_batches): # "Training"
global_step += 1
summ = sess.run(train_summ, feed_dict={handle: train_handle})
writer_train.add_summary(summary=summ, global_step=global_step)
sess.run(tf.local_variables_initializer())
for j in range(n_valid_batches): # "Validation"
_, batch_summ = sess.run([valid_loss_update, train_summ], feed_dict={handle: valid_handle})
summ = sess.run(valid_summ)
writer_valid.add_summary(summary=summ, global_step=global_step)
这篇关于如何使用 Tensorboard 在同一图上绘制不同的汇总指标?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!