lsh 发表于 2019-2-8 22:46:30

Check failed: work_element_count > 0 (0 vs. 0)

1
                I'm trying to train a mask rcnn model using Keras on my own dataset on a p2.xlarge EC2 aws instance.
When I launch the training, after a few steps of training:
Epoch 1/1    2/1000 [..............................] - ETA: 4:27:49 - loss: 5.1578 - rpn_class_loss: 0.0937 - rpn_bbox_loss: 0.6471 - mrcnn_class_loss: 2.6594 - mrcnn_bbox_loss: 1.1266 - mrcnn_mask_loss:0.6311I get this error message:
2018-05-02 13:44:56.193439: F ./tensorflow/core/util/cuda_launch_config.h:127] Check failed: work_element_count > 0 (0 vs. 0)My images are relatively small (~100Kb), few images in my dataset (~150).
The config I am using is as follow:
class CustomConfig(Config):    """Configuration for training on the custom dataset.    Derives from the base Config class    """    # Give the configuration a recognizable name    NAME = "blabla"    # We use a GPU with 12GB memory, which can fit two images.    # Adjust down if you use a smaller GPU.    IMAGES_PER_GPU = 2    # Number of classes (including background)    NUM_CLASSES = 11Any suggestion ? Thanks !
   

I downgraded my tensorflow-gpu package to 1.7.0 and it worked
   

This error has nothing to do with CUDA, it is something internal to Tensorflow, and I have edited the question accordingly

这个问题真奇怪,很多地方说版本问题,没有请自实验,因为我的机器不是所有tensorflow的程序都不能跑,所以不敢随便变化,只是在程序中加了

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

这个设置就ok了,奇怪不奇怪-_--_-




页: [1]
查看完整版本: Check failed: work_element_count > 0 (0 vs. 0)