Tensorflow 对象检测 API - “CenterNet Resnet50 V1 FPN 512x512"上的迁移学习模型错误 [英] Tensorflow Object Detection API - Transfer Learning on "CenterNet Resnet50 V1 FPN 512x512" model error

查看:219
本文介绍了Tensorflow 对象检测 API - “CenterNet Resnet50 V1 FPN 512x512"上的迁移学习模型错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用来自 模型动物园

I am trying to do transfer learning using the Tensorflow Object Detection API using the CenterNet Resnet50 V1 FPN 512x512 from the Model Zoo

我在基于 tensorflow/tensorflow:2.5.0-gpu-jupyter 的 Docker 环境中运行 Tensorflow 和最近签出的 https://github.com/tensorflow/models.git 在提交 eb6687ac

I am running Tensorflow in a Docker environment based on tensorflow/tensorflow:2.5.0-gpu-jupyter and a recent checkout of https://github.com/tensorflow/models.git at commit eb6687ac

我已经设置了目录结构并下载了预训练的模型:

I have set up the directory structure and download the pre-trained model:

mkdir -p /workspace/pre-trained-models/downloads/ && cd /workspace/pre-trained-models/downloads/

wget http://download.tensorflow.org/models/object_detection/tf2/20200711/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz

tar -zxvf centernet_resnet50_v1_fpn_512x512_coco17_tpu-8.tar.gz -C /workspace/pre-trained-models/

mkdir -p /workspace/models/my_centernet_resnet50_v1_fpn

cp /workspace/pre-trained-models/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8/pipeline.config /workspace/models/my_centernet_resnet50_v1_fpn/

我的pipeline.config如下:

请注意,我使用的是 use_bfloat16: true,因为我相信 RTX 3090 支持这一点.没有这一行,它也有同样的错误.

Note it that I am using use_bfloat16: true as I believe the RTX 3090 supports this. It has the same error without this line.

# CenterNet meta-architecture from the "Objects as Points" [1] paper
# with the ResNet-v2-101 backbone. The ResNet backbone has a few differences
# as compared to the one mentioned in the paper, hence the performance is
# slightly worse. This config is TPU comptatible.
# [1]: https://arxiv.org/abs/1904.07850
#

model {
  center_net {
    num_classes: 1
    feature_extractor {
      type: "resnet_v1_50_fpn"
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
      }
    }
    object_detection_task {
      task_loss_weight: 1.0
      offset_loss_weight: 1.0
      scale_loss_weight: 0.1
      localization_loss {
        l1_localization_loss {
        }
      }
    }
    object_center_params {
      object_center_loss_weight: 1.0
      min_box_overlap_iou: 0.7
      max_box_predictions: 100
      classification_loss {
        penalty_reduced_logistic_focal_loss {
          alpha: 2.0
          beta: 4.0
        }
      }
    }
  }
}

train_config: {

  batch_size: 32
  num_steps: 250000

  data_augmentation_options {
    random_horizontal_flip {
    }
  }


  optimizer {
    adam_optimizer: {
      epsilon: 1e-7  # Match tf.keras.optimizers.Adam's default.
      learning_rate: {
        cosine_decay_learning_rate {
          learning_rate_base: 1e-3
          total_steps: 250000
          warmup_learning_rate: 2.5e-4
          warmup_steps: 5000
        }
      }
    }
    use_moving_average: false
  }
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false

  fine_tune_checkpoint_version: V2
  fine_tune_checkpoint: "/workspace/pre-trained-models/centernet_resnet50_v1_fpn_512x512_coco17_tpu-8/checkpoint/ckpt-0"
  fine_tune_checkpoint_type: "detection"
  use_bfloat16: true
}

train_input_reader: {
  label_map_path: "/workspace/image-data/oli-fish/training_data/train.pbtxt"
  tf_record_input_reader {
    input_path: "/workspace/image-data/oli-fish/training_data/train.tfrecord"
  }
}

eval_config: {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1;
}

eval_input_reader: {
  label_map_path: "/workspace/image-data/oli-fish/test_data/test.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "/workspace/image-data/oli-fish/test_data/test.tfrecord"
  }
}

我的训练数据由一个类组成.

My training data consists of a single class.

我使用以下命令运行训练:

I run the training with the following command:

python object_detection/model_main_tf2.py --model_dir=/workspace/models/my_centernet_resnet50_v1_fpn/ --pipeline_config_path=/workspace/models/my_centernet_resnet50_v1_fpn/pipeline.config

我收到以下错误:

/home/tensorflow/.local/lib/python3.6/site-packages/tensorflow_addons/utils/ensure_tf_install.py:67: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.3.0 and strictly below 2.5.0 (nightly versions are not supported).
 The versions of TensorFlow you are currently using is 2.5.0 and is not supported.
Some things might work, some things might not.
If you were to encounter a bug, do not file an issue.
If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
You can find the compatibility matrix in TensorFlow Addon's readme:
https://github.com/tensorflow/addons
  UserWarning,
WARNING:tensorflow:Collective ops is not configured at program startup. Some performance features may not be enabled.
W0517 15:55:17.669740 140455981164352 mirrored_strategy.py:379] Collective ops is not configured at program startup. Some performance features may not be enabled.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0517 15:55:17.837340 140455981164352 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
INFO:tensorflow:Maybe overwriting train_steps: None
I0517 15:55:17.839460 140455981164352 config_util.py:552] Maybe overwriting train_steps: None
INFO:tensorflow:Maybe overwriting use_bfloat16: False
I0517 15:55:17.839519 140455981164352 config_util.py:552] Maybe overwriting use_bfloat16: False
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.870279 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.871679 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.873107 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.873559 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.876967 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.878880 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.891481 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.891972 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.892784 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0517 15:55:17.893235 140455981164352 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 thenbroadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py:546: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
W0517 15:55:19.319097 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py:546: StrategyBase.experimental_distribute_datasets_from_function (from tensorflow.python.distribute.distribute_lib) is deprecated and will be removed in a future version.
Instructions for updating:
rename to distribute_datasets_from_function
INFO:tensorflow:Reading unweighted datasets: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
I0517 15:55:19.320800 140455981164352 dataset_builder.py:163] Reading unweighted datasets: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
INFO:tensorflow:Reading record datasets for input file: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
I0517 15:55:19.320896 140455981164352 dataset_builder.py:80] Reading record datasets for input file: ['/workspace/image-data/oli-fish/training_data/train.tfrecord']
INFO:tensorflow:Number of filenames to read: 1
I0517 15:55:19.320939 140455981164352 dataset_builder.py:81] Number of filenames to read: 1
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
W0517 15:55:19.320975 140455981164352 dataset_builder.py:88] num_readers has been reduced to 1 to match input file shards.
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
W0517 15:55:19.322137 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:105: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_deterministic`.
WARNING:tensorflow:From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in afuture version.
Instructions for updating:
Use `tf.data.Dataset.map()
W0517 15:55:19.335709 140455981164352 deprecation.py:336] From /home/tensorflow/.local/lib/python3.6/site-packages/object_detection/builders/dataset_builder.py:237: DatasetV1.map_with_legacy_function (from tensorflow.python.data.ops.dataset_ops)is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.map()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
W0517 15:55:24.661983 140455981164352 deprecation.py:336] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:206: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a `tf.sparse.SparseTensor` and use `tf.sparse.to_dense` instead.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (fromtensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0517 15:55:26.951461 140455981164352 deprecation.py:336] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/impl/api.py:464: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py:435: UserWarning: `tf.keras.backend.set_learning_phase` is deprecated and will be removed after 2020-10-11. To update it, simply pass a True/False value to the `training` argument of the `__call__` method of your layer or model.
  warnings.warn('`tf.keras.backend.set_learning_phase` is deprecated and '
Traceback (most recent call last):
  File "object_detection/model_main_tf2.py", line 113, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "object_detection/model_main_tf2.py", line 110, in main
    record_summaries=FLAGS.record_summaries)
  File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 597, in train_loop
    train_input, unpad_groundtruth_tensors)
  File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/model_lib_v2.py", line 395, in load_fine_tune_checkpoint
    fine_tune_checkpoint_type=checkpoint_type)
  File "/home/tensorflow/.local/lib/python3.6/site-packages/object_detection/meta_architectures/center_net_meta_arch.py", line 4155, in restore_from_objects
    supported_types))
ValueError: Checkpoint type "detection" not supported for CenterNetResnetV1FpnFeatureExtractor. Supported types are ['classification', 'fine_tune']

根据我问的另一个问题 此处 使用将 fine_tune_checkpoint_type 的值设置为 detection 应该可以工作,但它不根据错误 Checkpoint type "检测"不支持 CenterNetResnetV1FpnFeatureExtractor.我做错了什么?

According to another question I asked here using setting the value of fine_tune_checkpoint_type to detection should work, but it doesn't according to the error Checkpoint type "detection" not supported for CenterNetResnetV1FpnFeatureExtractor. What am I doing wrong?

推荐答案

好的,这似乎有效.我使用的是 https://github.com/tensorflow/models.git 的旧结帐但我认为这是最新的(git 子模块问题).他们似乎修复了这个问题,或者至少在几周前更改了与此相关的代码.

Ok, this appears to work. I was using an older checkout of https://github.com/tensorflow/models.git but I thought it was the latest (git submodule issue). It appears they fixed this, or at least changed the code related to this a few weeks ago.

这篇关于Tensorflow 对象检测 API - “CenterNet Resnet50 V1 FPN 512x512"上的迁移学习模型错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆