TensorFlow 2.x:无法以h5格式保存经过训练的模型(OSError:无法创建链接(名称已经存在)) [英] TensorFlow 2.x: Cannot save trained model in h5 format (OSError: Unable to create link (name already exists))

查看:715
本文介绍了TensorFlow 2.x:无法以h5格式保存经过训练的模型(OSError:无法创建链接(名称已经存在))的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的模型使用预处理的数据来预测客户是私人客户还是非私人客户.预处理步骤使用诸如feature_column.bucketized_column(…),feature_column.embedding_column(…)等步骤. 训练后,我试图保存模型,但是出现以下错误:

My model uses pre-processed data to predict if a customer is a private or non-private customer. The pre-processing-step is using steps like feature_column.bucketized_column(…), feature_column.embedding_column(…) and so on. After the training, I am trying to save the model but I get the following error:

文件"h5py_objects.pyx",第54行,位于h5py._objects.with_phil.wrapper
在h5py._objects.with_phil.wrapper中的文件"h5py_objects.pyx"(第55行)
在h5py.h5o.link
中的文件"h5py \ h5o.pyx"第202行 OSError:无法创建链接(名称已存在)

File "h5py_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5o.pyx", line 202, in h5py.h5o.link
OSError: Unable to create link (name already exists)

我尝试了以下方法来解决我的问题:

I have tried the following to solve my problem:

  • I tried to exclude the optimizer as mentioned here: https://github.com/tensorflow/tensorflow/issues/27688.
  • I tried different versions of TensorFlow like 2.2 and 2.3.
  • I tried to reinstall h5py like mentioned here: RuntimeError: Unable to create link (name already exists) when I append hdf5 file?.

一切都没有成功!

以下是模型的相关代码:

Here is the relevant code of the Model:

(feature_columns, train_ds, val_ds, test_ds) = preprocessing.getPreProcessedDatasets(args.data, args.zip, args.batchSize)

feature_layer = tf.keras.layers.DenseFeatures(feature_columns, trainable=False)

model = tf.keras.models.Sequential([
        feature_layer,
        tf.keras.layers.Dense(1, activation=tf.nn.sigmoid)
    ])

model.compile(optimizer='sgd',
        loss='binary_crossentropy',
        metrics=['accuracy'])

paramString = "Arg-e{}-b{}-z{}".format(args.epoch, args.batchSize, bucketSizeGEO)

...

model.fit(train_ds,
              validation_data=val_ds,
              epochs=args.epoch,
              callbacks=[tensorboard_callback])


model.summary()

loss, accuracy = model.evaluate(test_ds)
print("Accuracy", accuracy)

paramString = paramString + "-a{:.4f}".format(accuracy)

outputName = "logReg" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") + paramStrin

if args.saveModel:
       filepath = "./saved_models/" + outputName + ".h5"
       model.save(filepath, save_format='h5')

在Modul预处理中调用的函数:

Called function in preprocessing Modul:

def getPreProcessedDatasets(filepath, zippath, batch_size, bucketSizeGEO):
    print("start preprocessing...")

    path = filepath
    data = pd.read_csv(path, dtype={
    "NAME1": np.str_, 
    "NAME2": np.str_, 
    "EMAIL1": np.str_, 
    "ZIP": np.str_, 
    "STREET": np.str_, 
    "LONGITUDE":np.floating, 
    "LATITUDE": np.floating, 
    "RECEIVERTYPE": np.int64}) 

    feature_columns = []

    data = data.fillna("NaN")

    data = __preProcessName(data)
    data = __preProcessStreet(data)
    
    train, test = train_test_split(data, test_size=0.2, random_state=0)
    train, val = train_test_split(train, test_size=0.2, random_state=0)

    train_ds = __df_to_dataset(train, batch_size=batch_size)
    val_ds = __df_to_dataset(val, shuffle=False, batch_size=batch_size)
    test_ds = __df_to_dataset(test, shuffle=False, batch_size=batch_size)


    __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, True)

    print("preprocessing completed")

    return (feature_columns, train_ds, val_ds, test_ds)

调用功能的不同预处理功能:

Calling the different preprocessing functions of the features:

def __buildFeatureColums(feature_columns, data, zippath, bucketSizeGEO, addCrossedFeatures):
    
    feature_columns.append(__getFutureColumnLon(bucketSizeGEO))
    feature_columns.append(__getFutureColumnLat(bucketSizeGEO))
    
    (namew1_one_hot, namew2_one_hot) = __getFutureColumnsName(__getNumberOfWords(data, 'NAME1PRO'))
    feature_columns.append(namew1_one_hot)
    feature_columns.append(namew2_one_hot)
    
    feature_columns.append(__getFutureColumnStreet(__getNumberOfWords(data, 'STREETPRO')))
    
    feature_columns.append(__getFutureColumnZIP(2223, zippath))
    
    if addCrossedFeatures:
        feature_columns.append(__getFutureColumnCrossedNames(100))
        feature_columns.append(__getFutureColumnCrossedZIPStreet(100, 2223, zippath))

功能已重新嵌入到嵌入中:

Function reletated to embeddings:

def __getFutureColumnsName(name_num_words):
    vocabulary_list = np.arange(0, name_num_words + 1, 1).tolist()

    namew1_voc = tf.feature_column.categorical_column_with_vocabulary_list(
        key='NAME1W1', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)
    namew2_voc = tf.feature_column.categorical_column_with_vocabulary_list(
        key='NAME1W2', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)

    dim = __getNumberOfDimensions(name_num_words)

    namew1_embedding = feature_column.embedding_column(namew1_voc, dimension=dim)
    namew2_embedding = feature_column.embedding_column(namew2_voc, dimension=dim)

    return (namew1_embedding, namew2_embedding)

def __getFutureColumnStreet(street_num_words):
    vocabulary_list = np.arange(0, street_num_words + 1, 1).tolist()

    street_voc = tf.feature_column.categorical_column_with_vocabulary_list(
        key='STREETW', vocabulary_list=vocabulary_list, dtype=tf.dtypes.int64)

    dim = __getNumberOfDimensions(street_num_words)

    street_embedding = feature_column.embedding_column(street_voc, dimension=dim)

    return street_embedding

def __getFutureColumnZIP(zip_num_words, zippath):
    zip_voc = feature_column.categorical_column_with_vocabulary_file(
    key='ZIP', vocabulary_file=zippath, vocabulary_size=zip_num_words,
    default_value=0)

    dim = __getNumberOfDimensions(zip_num_words)

    zip_embedding = feature_column.embedding_column(zip_voc, dimension=dim)

    return zip_embedding

推荐答案

以h5格式保存模型时,错误OSError: Unable to create link (name already exists)是由某些重复的变量名称引起的.通过for i, w in enumerate(model.weights): print(i, w.name)检查表明它们是embedding_weights名称.

The error OSError: Unable to create link (name already exists) when saving model in h5 format is caused by some duplicate variable names. Checking by for i, w in enumerate(model.weights): print(i, w.name) showed that they are the embedding_weights names.

通常,在构建feature_column时,将使用传递到每个功能列的独特的key来构建独特的变量name.这在TF 2.1中可以正常使用,但在TF 2.2和2.3中则无法使用,并且已在TF 2.4中被彻底修复.

Normally, when building feature_column, the distinct key passed into each feature column will be used to build distinct variable name. This worked correctly in TF 2.1 but broke in TF 2.2 and 2.3, and supposedly fixed in TF 2.4 nigthly.

这篇关于TensorFlow 2.x:无法以h5格式保存经过训练的模型(OSError:无法创建链接(名称已经存在))的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆