coco数据集解析4

shaoheshaohe · 发表于 2019-6-11 20:11:55

COCO Dataset 数据特点
COCO数据集有超过 200,000 张图片，80种物体类别. 所有的物体实例都用详细的分割mask进行了标注，共标注了超过 500,000 个物体实体.

{
person  # 1
vehicle 交通工具 #8
      {bicycle
      car
      motorcycle
      airplane
      bus
      train
      truck
      boat}
outdoor  #5
      {traffic light
      fire hydrant
      stop sign
      parking meter
      bench}
animal  #10
      {bird
      cat
      dog
      horse
      sheep
      cow
      elephant
      bear
      zebra
      giraffe}
accessory 饰品 #5
      {backpack 背包
      umbrella 雨伞
      handbag 手提包
      tie 领带
      suitcase 手提箱
      }
sports  #10
      {frisbee
      skis
      snowboard
      sports ball
      kite
      baseball bat
      baseball glove
      skateboard
      surfboard
      tennis racket
      }
kitchen  #7
      {bottle
      wine glass
      cup
      fork
      knife
      spoon
      bowl
      }
food  #10
      {banana
      apple
      sandwich
      orange
      broccoli
      carrot
      hot dog
      pizza
      donut
      cake
      }
furniture 家具 #6
      {chair
      couch
      potted plant
      bed
      dining table
      toilet
      }
electronic 电子产品 #6
      {tv
      laptop
      mouse
      remote
      keyboard
      cell phone
      }
appliance 家用电器 #5
      {microwave
      oven
      toaster
      sink
      refrigerator
      }
indoor  #7
      {book
      clock
      vase
      scissors
      teddy bear
      hair drier
      toothbrush
      }
}

注：
PASCAL VOC 语义类别(#20)：

{
aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor
}

COCO Dataset
annotainon 数据格式：
- object instances
- object keypoints
- image captions

基本数据结构如下：

{
"info" : info,
"images" : [image],
"annotations" : [annotation],
"licenses" : [license],
}

info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime,
}

image{
"id" : int, # 图片id
"width" : int, # 图片宽
"height" : int, # 图片高
"file_name" : str, # 图片名
"license" : int,
"flickr_url" : str,
"coco_url" : str, # 图片链接
"date_captured" : datetime, # 图片标注时间
}

license{
"id" : int,
"name" : str,
"url" : str,
}

Object Instance Annotations
实例标注形式：

annotation{
"id" : int,
"image_id" : int,
"category_id" : int,
"segmentation" : RLE or [polygon],
"area" : float,
"bbox" : [x,y,width,height],
"iscrowd" : 0 or 1,
}

categories[{
"id" : int,
"name" : str,
"supercategory" : str,
}]

其中，
如果instance表示单个object，则iscrowd=0，segmentation=polygon；单个object也可能需要多个polygons，比如occluded的情况下；
如果instance表示多个objecs的集合，则iscrowd=1，segmentation=RLE. iscrowd=1用于标注较多的objects，比如人群.

Object Keypoint Annotations
关键点标注形式：

annotation{
"keypoints" : [x1,y1,v1,...],
"num_keypoints" : int,
"[cloned]" : ...,
}

categories[{
"keypoints" : [str],
"skeleton" : [edge],
"[cloned]" : ...,
}]

关键点标注包括了物体标注的所有数据(比如 id, bbox, 等等)，以及两种额外属性信息.
“keypoints”是长度为 3K 的数组，K是对某类定义的关键点总数，位置为[x,y]，关键点可见性v.
如果关键点没有标注信息，则关键点位置[x=y=0]，可见性v=1；
如果关键点有标注信息，但不可见，则v=2.
如果关键点在物体segment内，则认为可见.
“num_keypoints”是物体所标注的关键点数(v>0). 对于物体较多，比如物体群或者小物体时，num_keypoints=0.
对于每个类别，categories结构体数据有两种属性：”keypoints” 和 “skeleton”.
“keypoints” 是长度为k的关键点名字符串；
“skeleton” 定义了关键点的连通性，主要是通过一组关键点边缘队列表的形式表示，用于可视化.

COCO现阶段仅队人体类别进行了标注.

Image Caption Annotations
图片描述/说明标注形式：

annotation{
"id" : int,
"image_id" : int,
"caption" : str,
}

图片描述标注包含了图片的主题信息. 每个主题描述了特定的图片，每张图片至少有5个主题.