查看: 2691|回复: 0

mask rcnn测试中遇到的问题解决

[复制链接]

665

主题

1234

帖子

6695

积分

xdtech

Rank: 5Rank: 5

积分
6695
发表于 2019-2-9 00:20:08 | 显示全部楼层 |阅读模式
代码下载:https://github.com/matterport/Mask_RCNN


问题一:


出错代码行:


model.load_weights(weights_path, by_name=True)
错误显示:


Traceback (most recent call last):
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/samples/cancer/inspect_weights.py", line 76, in <module>
    model.load_weights(weights_path, by_name=True)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/mrcnn/model.py", line 2143, in load_weights
    saving.load_weights_from_hdf5_group_by_name(f, layers)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/engine/saving.py", line 1149, in load_weights_from_hdf5_group_by_name
    str(weight_values.shape) + '.')
ValueError: Layer #389 (named "mrcnn_bbox_fc"), weight <tf.Variable 'mrcnn_bbox_fc/kernel:0' shape=(1024, 8) dtype=float32_ref> has shape (1024, 8), but the saved weight has shape (1024, 324).
解决:


如果你不想测试coco里默认的81类,只想测试2类,那一定记住要把model.load_weights(COCO_MODEL_PATH, by_name=True)改为model.load_weights(COCO_MODEL_PATH, by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc","mrcnn_bbox", "mrcnn_mask"])
参考博客:https://blog.csdn.net/qq_16065939/article/details/84769397


问题二:


出错代码:train_mask.py


错误显示:


2019-01-05 20:01:15.518461: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_ops.cc:446 : Resource exhausted: OOM when allocating tensor with shape[1,512,512,64] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Traceback (most recent call last):
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/samples/cancer/train_mask.py", line 109, in <module>
    model.train(dataset_train, dataset_val, learning_rate=config.LEARNING_RATE, epochs=50, layers="all")
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/mrcnn/model.py", line 2387, in train
    use_multiprocessing=True, #改为单线程
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/engine/training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/qln/workspace/Mask R-CNN/Mask_Rcnn-matterport/runutil/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,512,512,64] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
         [[{{node conv1/convolution}} = Conv2D[T=DT_FLOAT, _class=["loctraining/SGD/gradients/conv1/convolution_grad/Conv2DBackpropFilter"], data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](zero_padding2d_1/Pad, conv1/kernel/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Process finished with exit code 130 (interrupted by signal 2: SIGINT)
解决:


遇到这个问题大家不要慌,这是因为大家设置了多线程,但是线程不同步造成的。大家耐心等一会,就会出现loss了。同时如果大家不希望出现多线程,大家可以改为单线程。更改方法如下:在Mask RCNN\mrcnn\model.py中


self.keras_model.fit_generator(
            train_generator,
            initial_epoch=self.epoch,
            epochs=epochs,
            steps_per_epoch=self.config.STEPS_PER_EPOCH,
            callbacks=callbacks,
            validation_data=val_generator,
            validation_steps=self.config.VALIDATION_STEPS,
            max_queue_size=100,
            workers=workers,
            use_multiprocessing=True,
#            use_multiprocessing=False,
        )
大家如果只是想要把多线程改为单线程,就要把use_multiprocessing=False,同时要让workers=1。因为单线程的情况下让workers大于1会报错!


问题三:


打开tensorboard出错,可以打开,但是没有任何数据显示


错误显示:Tensorboard网址出现No dashboards are active for the current data set


问题解决: tensorboard --logdir的路径出现问题,只需要标记到logs即可,而且路径不可以出现空格


tensorboard --logdir /home/qln/workspace/Mask_R-CNN/cancer/logs
TensorBoard 1.12.0 at http://qln:6006 (Press CTRL+C to quit)
得到完美解决.


回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

快速回复 返回顶部 返回列表