如何使用Keras TensorBoard回调进行网格搜索 [英] How to use Keras TensorBoard callback for grid search
问题描述
我正在使用Keras TensorBoard回调. 我想进行网格搜索,并在张量板上可视化每个模型的结果. 问题是不同运行的所有结果都合并在一起,损失图像这样混乱:
I'm using the Keras TensorBoard callback. I would like to run a grid search and visualize the results of each single model in the tensor board. The problem is that all results of the different runs are merged together and the loss plot is a mess like this:
如何重命名每次运行以具有类似于以下内容:
How can I rename each run to have something similar to this:
以下是网格搜索的代码:
Here the code of the grid search:
df = pd.read_csv('data/prepared_example.csv')
df = time_series.create_index(df, datetime_index='DATE', other_index_list=['ITEM', 'AREA'])
target = ['D']
attributes = ['S', 'C', 'D-10','D-9', 'D-8', 'D-7', 'D-6', 'D-5', 'D-4',
'D-3', 'D-2', 'D-1']
input_dim = len(attributes)
output_dim = len(target)
x = df[attributes]
y = df[target]
param_grid = {'epochs': [10, 20, 50],
'batch_size': [10],
'neurons': [[10, 10, 10]],
'dropout': [[0.0, 0.0], [0.2, 0.2]],
'lr': [0.1]}
estimator = KerasRegressor(build_fn=create_3_layers_model,
input_dim=input_dim, output_dim=output_dim)
tbCallBack = TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=False)
grid = GridSearchCV(estimator=estimator, param_grid=param_grid, n_jobs=-1, scoring=bug_fix_score,
cv=3, verbose=0, fit_params={'callbacks': [tbCallBack]})
grid_result = grid.fit(x.as_matrix(), y.as_matrix())
推荐答案
我认为没有任何方法可以将每次运行"参数传递给GridSearchCV
.也许最简单的方法是将KerasRegressor
子类化以完成您想要的事情.
I don't think there is any way to pass a "per-run" parameter to GridSearchCV
. Maybe the easiest approach would be to subclass KerasRegressor
to do what you want.
class KerasRegressorTB(KerasRegressor):
def __init__(self, *args, **kwargs):
super(KerasRegressorTB, self).__init__(*args, **kwargs)
def fit(self, x, y, log_dir=None, **kwargs):
cbs = None
if log_dir is not None:
params = self.get_params()
conf = ",".join("{}={}".format(k, params[k])
for k in sorted(params))
conf_dir = os.path.join(log_dir, conf)
cbs = [TensorBoard(log_dir=conf_dir, histogram_freq=0,
write_graph=True, write_images=False)]
super(KerasRegressorTB, self).fit(x, y, callbacks=cbs, **kwargs)
您将以如下方式使用它:
You would use it like:
# ...
estimator = KerasRegressorTB(build_fn=create_3_layers_model,
input_dim=input_dim, output_dim=output_dim)
#...
grid = GridSearchCV(estimator=estimator, param_grid=param_grid,
n_jobs=1, scoring=bug_fix_score,
cv=2, verbose=0, fit_params={'log_dir': './Graph'})
grid_result = grid.fit(x.as_matrix(), y.as_matrix())
更新:
由于交叉验证,GridSearchCV
多次运行相同的模型(即参数的相同配置),因此先前的代码最终将在每次运行中放置多个跟踪.查看源代码(此处和此处 ),似乎没有办法检索当前拆分ID".同时,您不应该只是检查现有文件夹并根据需要添加子修补程序,因为这些作业是并行运行的(至少有可能,尽管我不确定Keras/TF是否会并行运行).您可以尝试这样的事情:
Since GridSearchCV
runs the same model (i.e. the same configuration of parameters) more than once due to cross-validation, the previous code will end up putting multiple traces in each run. Looking at the source (here and here), there doesn't seem to be a way to retrieve the "current split id". At the same time, you shouldn't just check for existing folders and add subfixes as needed, because the jobs run (potentially at least, although I'm not sure if that's the case with Keras/TF) in parallel. You can try something like this:
import itertools
import os
class KerasRegressorTB(KerasRegressor):
def __init__(self, *args, **kwargs):
super(KerasRegressorTB, self).__init__(*args, **kwargs)
def fit(self, x, y, log_dir=None, **kwargs):
cbs = None
if log_dir is not None:
# Make sure the base log directory exists
try:
os.makedirs(log_dir)
except OSError:
pass
params = self.get_params()
conf = ",".join("{}={}".format(k, params[k])
for k in sorted(params))
conf_dir_base = os.path.join(log_dir, conf)
# Find a new directory to place the logs
for i in itertools.count():
try:
conf_dir = "{}_split-{}".format(conf_dir_base, i)
os.makedirs(conf_dir)
break
except OSError:
pass
cbs = [TensorBoard(log_dir=conf_dir, histogram_freq=0,
write_graph=True, write_images=False)]
super(KerasRegressorTB, self).fit(x, y, callbacks=cbs, **kwargs)
我正在使用os
呼吁实现Python 2兼容性,但是如果您正在使用Python 3,则可以考虑使用更好的
I'm using os
calls for Python 2 compatibility, but if you are using Python 3 you may consider the nicer pathlib
module for path and directory handling.
注意:我忘记了前面提到的内容,但以防万一,请注意,传递write_graph=True
将记录每次运行图,根据您的模型,这可能意味着很多(相对说)这个空间.尽管我不知道该功能所需的空间,但write_images
也是一样.
Note: I forgot to mention it earlier, but just in case, note that passing write_graph=True
will log a graph per run, which, depending on your model, could mean a lot (relatively speaking) of this space. The same would apply to write_images
, although I don't know the space that feature requires.
这篇关于如何使用Keras TensorBoard回调进行网格搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!