前言
本文记录一下yolov5实例分割的完全过程,包括制作自己的数据集,标签转换,然后训练模型,测试模型效果。
本文更推荐在电脑端阅读~ ??
一、实例分割--数据标注
实例分割的标签,可以用多边形表示,那么我们可以用 labelme 或 labelimg进行标注。本文示例用labelme展示。
labelme 开源地址:https://github.com/wkentaro/labelme
安装方式,可以通过conda、pip等方式安装,支持windows、linux等系统。
下面是conda的安装方式,其它方式参考上面的网址安装即可。
# python3 conda create --name=labelme python=3 conda activate labelme pip install labelme
安装好labelme后,在环境终端输入labelme,来打开它;
然后点击Open(打开单张图片)、OpenDir(打开一个文件夹下所用图片),看到如下界面
然后,点击鼠标右键,选择Create Polygons(创建多边形);
通过点击鼠标左键,创建多个点,来包围我们要标注的物体;比如我们需要标注:car、bus、motorbike,三个类别。
然后点击Save(保存刚才标注的信息),保存为一个json文件。
里面的关键信息:
- label:是指类别名称;比如刚才标注的car、bus、motorbike,三个类别。
- point:是标注的点,对应通过点击鼠标左键,创建多个点,来包围我们要标注的物体;每个点对应(x,y)的值。
- imagePath:图片的路径以及名称。
- imageHeight:图片的高度。
- imageWidth:图片的宽度。
我们待会需要把json文件转为yolov5训练的txt标签格式。
先看看labelme生成的标签示例,json文件:
{ "version": "5.1.1", "flags": {}, "shapes": [ { "label": "car", "points": [ [ 518.2777777777778, 623.5 ], [ 498.83333333333337, 666.5555555555555 ], [ 498.83333333333337, 709.6111111111111 ], [ 501.6111111111111, 737.3888888888889 ], [ 529.3888888888889, 738.7777777777778 ], [ 597.4444444444445, 741.5555555555555 ], [ 626.6111111111111, 738.7777777777778 ], [ 628.0, 672.1111111111111 ], [ 636.3333333333334, 652.6666666666667 ], [ 614.1111111111111, 629.0555555555555 ], [ 548.8333333333334, 620.7222222222223 ] ], "group_id": null, "shape_type": "polygon", "flags": {} }, { "label": "bus", "points": [ [ 90.49999999999997, 561.0 ], [ 155.77777777777774, 558.2222222222223 ], [ 190.49999999999997, 561.0 ], [ 251.61111111111106, 563.7777777777778 ], [ 300.2222222222222, 569.3333333333334 ], [ 305.7777777777777, 638.7777777777778 ], [ 298.83333333333326, 648.5 ], [ 250.2222222222222, 647.1111111111111 ], [ 222.44444444444443, 656.8333333333334 ], [ 207.16666666666666, 679.0555555555555 ], [ 207.16666666666666, 701.2777777777778 ], [ 177.99999999999997, 705.4444444444445 ], [ 157.16666666666666, 698.5 ], [ 126.61111111111106, 701.2777777777778 ], [ 127.99999999999997, 709.6111111111111 ], [ 105.77777777777774, 705.4444444444445 ], [ 100.2222222222222, 692.9444444444445 ], [ 79.38888888888886, 692.9444444444445 ], [ 83.55555555555551, 608.2222222222223 ], [ 68.27777777777774, 609.6111111111111 ], [ 68.27777777777774, 572.1111111111111 ] ], "group_id": null, "shape_type": "polygon", "flags": {} }, { "label": "motorbike", "points": [ [ 976.6111111111112, 654.0555555555555 ], [ 962.7222222222223, 674.8888888888889 ], [ 962.7222222222223, 690.1666666666667 ], [ 966.888888888889, 716.5555555555555 ], [ 990.5000000000001, 720.7222222222223 ], [ 1004.388888888889, 708.2222222222223 ], [ 1005.7777777777777, 676.2777777777778 ], [ 1000.2222222222223, 658.2222222222223 ] ], "group_id": null, "shape_type": "polygon", "flags": {} } ], "imagePath": "traffic04.png", "imageData": "iVBORw0KGgoAAAANSUhEUgAABWUAAAOBCAYAAACZFsKEAAEAAElEQVR4nOz9W8/sOpMmBgaVuXYd3HXTaBTQ/f9/StsGPD7AnosButoeow/jKg/QF9VA1fftlSn6QhnKUOiJA0UqM993r2ftd0spiWg27Kl6lzqS2Ugp+Xa/pSzDhJ8vSQ7+ZlIdCxfd36EuWqIlz+yoBWD/0TzYx65h+XMR3oqzS/X6Hj+oOmKHaBauI89ZRC2Kz3e7zW9s8dWraNwEek5+39xlH4vvdAG/B21/Xa8fP27fDsWpW1/IayTkiB0LTUtW/WKVQv7+Drxj/+v8TLQpxSFFlGAAAAABJRU5ErkJggg==", "imageHeight": 897, "imageWidth": 1381 }
二、标签转换 json转txt
下面我们把json文件转为yolov5训练的txt标签格式。
首先了解一下yolov5 的txt标签格式,分为由类别 、坐标点x、坐标点y组成;其中坐标x和y都是做了归一化的。
类别 x1 y1 x2 y2 ....... xn yn
看一个示例:
1 0.07505175983436857 0.3322981366459628 0.16718426501035202 0.4989648033126295 0.0025879917184265366 0.556935817805383 0.0036231884057971323 0.365424430641822 1 0.662008281573499 0.1356107660455487 0.7572463768115941 0.28467908902691513 0.9983333333333333 0.16563615198920995 0.9983333333333333 0.0010351966873706593 1 0.8059006211180125 0.898550724637681 0.5833333333333334 0.9834368530020703 0.5922712215320913 0.9983333333333333 0.8628364389233955 0.9983333333333333 1 0.9839544513457558 0.8302277432712215 0.8059006211180125 0.898550724637681 0.8655644489978869 0.9983333333333333 0.9953416149068324 0.9983333333333333 0.9983333333333333 0.8603253623188408
这里直接上转换代码,json转为txt
import json import os import argparse from tqdm import tqdm def convert_label_json(json_dir, save_dir, classes): json_paths = os.listdir(json_dir) classes = classes.split(',') for json_path in tqdm(json_paths): # print("json_path:", json_path) path = os.path.join(json_dir,json_path) with open(path,'r') as load_f: json_dict = json.load(load_f) h, w = json_dict['imageHeight'], json_dict['imageWidth'] # save txt path txt_path = os.path.join(save_dir, json_path.replace('json', 'txt')) txt_file = open(txt_path, 'w') for shape_dict in json_dict['shapes']: label = shape_dict['label'] label_index = classes.index(label) points = shape_dict['points'] points_nor_list = [] for point in points: points_nor_list.append(point[0]/w) points_nor_list.append(point[1]/h) points_nor_list = list(map(lambda x:str(x),points_nor_list)) points_nor_str = ' '.join(points_nor_list) label_str = str(label_index) + ' ' +points_nor_str + ' ' txt_file.writelines(label_str) # print("end ") if __name__ == "__main__": parser = argparse.ArgumentParser(description='json convert to txt params') parser.add_argument('--json-dir', type=str, help='json path dir') parser.add_argument('--save-dir', type=str, help='txt save dir') parser.add_argument('--classes', type=str, help='classes') args = parser.parse_args() json_dir = args.json_dir save_dir = args.save_dir classes = args.classes convert_label_json(json_dir, save_dir, classes)
使用示例
python json2txt.py --json-dir ./label_json/ --save-dir ./label_txt/ --classes "car,bus,motorbike"
- --json-dir:是用labelme标注好的json文件路径
- --save-dir:是保存txt的路径
- --classes:是标注的类别;这需要确认顺序,比如输入的是"car,bus,motorbike",对应编码id为0,1,2
三、模型训练
yolov5的v7版本支持实例分割,我们去官网下载开源代码就可以啦。
https://github.com/ultralytics/yolov5/tree/v7.0
这一步我们来到了模型训练啦 ( ?? ω ?? )y
我们首先把转换的数据,存放在datasets文件夹中,其中datasets与yolov5-master是同一级目录的。yolov5-master目录下面就存储yolov5的的代码了。
创建一个名称为autopilot_seg,存放刚才标注的图片和标签。目录结构如下:
— datasets
— autopilot_seg
— images
— labels
— yolov5-master
— classify
— data
— models
— segment
— utils
............
由于是示例展示,没有把数据集划分:训练集、验证集、测试集。
训练和验证用同一个数据集,但是,但是,但是,在实际验证、做实验、做项目,十分有必要分训练集、验证集的。
然后在yolov5-master工程中的data目录,创建一个名称为autopilot_seg.yaml,来指定数据集存放情况:
# YOLOv5 ?? by Ultralytics, GPL-3.0 license # parent # ├── yolov5 # └── datasets # └── autopilot_seg ← downloads here path: ../datasets/autopilot_seg # dataset root dir train: images/ # train images val: images/ # val images test: # test images (optional) # Classes names: 0: car 1: bus 2: motorbike
开始训练模型啦
我们使用预训练模型yolov5s-seg.pt,进行训练新的数据,这样加快模型收敛:
python segment/train.py --data autopilot_seg.yaml --weights yolov5s-seg.pt --img 640
默认是训练100轮的,通过下面的传参,可以修改训练参数
文件路径:segment train.py,大约在463行代码;
def parse_opt(known=False): parser = argparse.ArgumentParser() parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s-seg.pt', help='initial weights path') parser.add_argument('--cfg', type=str, default='', help='model.yaml path') parser.add_argument('--data', type=str, default=ROOT / 'data/coco128-seg.yaml', help='dataset.yaml path') parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path') parser.add_argument('--epochs', type=int, default=100, help='total training epochs') parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch') parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)') parser.add_argument('--rect', action='store_true', help='rectangular training') parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training') parser.add_argument('--nosave', action='store_true', help='only save final checkpoint') parser.add_argument('--noval', action='store_true', help='only validate final epoch') parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor') parser.add_argument('--noplots', action='store_true', help='save no plot files') parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations') parser.add_argument('--bucket', type=str, default='', help='gsutil bucket') parser.add_argument('--cache', type=str, nargs='?', const='ram', help='image --cache ram/disk') parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training') parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%') parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class') parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer') parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode') parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)') parser.add_argument('--project', default=ROOT / 'runs/train-seg', help='save to project/name') parser.add_argument('--name', default='exp', help='save to project/name') parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment') parser.add_argument('--quad', action='store_true', help='quad dataloader') parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler') parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon') parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)') parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2') parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)') parser.add_argument('--seed', type=int, default=0, help='Global training seed') parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify') # Instance Segmentation Args parser.add_argument('--mask-ratio', type=int, default=4, help='Downsample the truth masks to saving memory') parser.add_argument('--no-overlap', action='store_true', help='Overlap masks train faster at slightly less mAP') return parser.parse_known_args()[0] if known else parser.parse_args()
下面是一些打印信息:
hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0 TensorBoard: Start with 'tensorboard --logdir runs/train-seg', view at http://localhost:6006/ Overriding model.yaml nc=80 with nc=3 from n params module arguments 0 -1 1 3520 models.common.Conv [3, 32, 6, 2, 2] 1 -1 1 18560 models.common.Conv [32, 64, 3, 2] 2 -1 1 18816 models.common.C3 [64, 64, 1] 3 -1 1 73984 models.common.Conv [64, 128, 3, 2] 4 -1 2 115712 models.common.C3 [128, 128, 2] 5 -1 1 295424 models.common.Conv [128, 256, 3, 2] 6 -1 3 625152 models.common.C3 [256, 256, 3] 7 -1 1 1180672 models.common.Conv [256, 512, 3, 2] 8 -1 1 1182720 models.common.C3 [512, 512, 1] 9 -1 1 656896 models.common.SPPF [512, 512, 5] 10 -1 1 131584 models.common.Conv [512, 256, 1, 1] 11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 12 [-1, 6] 1 0 models.common.Concat [1] 13 -1 1 361984 models.common.C3 [512, 256, 1, False] 14 -1 1 33024 models.common.Conv [256, 128, 1, 1] 15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 16 [-1, 4] 1 0 models.common.Concat [1] 17 -1 1 90880 models.common.C3 [256, 128, 1, False] 18 -1 1 147712 models.common.Conv [128, 128, 3, 2] 19 [-1, 14] 1 0 models.common.Concat [1] 20 -1 1 296448 models.common.C3 [256, 256, 1, False] 21 -1 1 590336 models.common.Conv [256, 256, 3, 2] 22 [-1, 10] 1 0 models.common.Concat [1] 23 -1 1 1182720 models.common.C3 [512, 512, 1, False] 24 [17, 20, 23] 1 407464 models.yolo.Segment [3, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], 32, 128, [128, 256, 512]] Model summary: 225 layers, 7413608 parameters, 7413608 gradients, 25.9 GFLOPs Transferred 361/367 items from yolov5s-seg.pt AMP: checks passed ? optimizer: SGD(lr=0.01) with parameter groups 60 weight(decay=0.0), 63 weight(decay=0.0005), 63 bias train: Scanning /guopu/yolov5-v7-seg/datasets/parking_seg/labels.cache... 172 images, 1 backgrounds, 0 corrupt: 100%|██████████| 173/173 [00:00<?, ?it/s] train: Caching images (0.2GB ram): 100%|██████████| 173/173 [00:00<00:00, 744.77it/s] val: Scanning /guopu/yolov5-v7-seg/datasets/parking_seg/labels.cache... 172 images, 1 backgrounds, 0 corrupt: 100%|██████████| 173/173 [00:00<?, ?it/s] val: Caching images (0.2GB ram): 100%|██████████| 173/173 [00:00<00:00, 425.08it/s] AutoAnchor: 4.52 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset ? Plotting labels to runs/train-seg/exp3/labels.jpg... Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs/train-seg/exp3 Starting training for 100 epochs... Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 0/99 4.41G 0.119 0.1303 0.03598 0.04046 42 640: 100%|██████████| 11/11 [00:11<00:00, 1.04s/it] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:04<00:00, 1.21it/s] all 173 416 0.00755 0.172 0.00566 0.00126 0.00114 0.163 0.000808 0.000146 Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 1/99 4.88G 0.1122 0.08991 0.03867 0.03891 66 640: 100%|██████████| 11/11 [00:04<00:00, 2.31it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:04<00:00, 1.33it/s] all 173 416 0.0038 0.55 0.00454 0.0012 0.00196 0.287 0.00155 0.000278 Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 2/99 4.88G 0.09685 0.07117 0.04401 0.03686 64 640: 100%|██████████| 11/11 [00:04<00:00, 2.35it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:04<00:00, 1.46it/s] all 173 416 0.00546 0.752 0.0105 0.00266 0.00407 0.586 0.00507 0.00103 Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 3/99 4.88G 0.08271 0.06062 0.04612 0.03379 52 640: 100%|██████████| 11/11 [00:04<00:00, 2.33it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:03<00:00, 1.69it/s] all 173 416 0.00615 0.752 0.0121 0.00342 0.00653 0.506 0.00844 0.00184 ........... Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 96/99 4.89G 0.01945 0.01475 0.01745 0.003529 42 640: 100%|██████████| 11/11 [00:04<00:00, 2.50it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:02<00:00, 2.02it/s] all 173 416 0.989 0.996 0.994 0.884 0.989 0.996 0.994 0.86 Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 97/99 4.89G 0.01732 0.01472 0.0163 0.003331 35 640: 100%|██████████| 11/11 [00:04<00:00, 2.51it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:02<00:00, 2.09it/s] all 173 416 0.987 0.996 0.994 0.878 0.987 0.996 0.994 0.866 Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 98/99 4.89G 0.01833 0.01492 0.01707 0.003123 38 640: 100%|██████████| 11/11 [00:04<00:00, 2.49it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:02<00:00, 2.03it/s] all 173 416 0.986 0.996 0.994 0.881 0.986 0.996 0.994 0.865 Epoch GPU_mem box_loss seg_loss obj_loss cls_loss Instances Size 99/99 4.89G 0.01827 0.01308 0.01517 0.00358 56 640: 100%|██████████| 11/11 [00:04<00:00, 2.45it/s] Class Images Instances Box(P R mAP50 mAP50-95) Mask(P R mAP50 mAP50-95): 100%|██████████| 6/6 [00:03<00:00, 1.99it/s] all 173 416 0.987 0.996 0.994 0.887 0.987 0.996 0.994 0.868 100 epochs completed in 0.230 hours. Optimizer stripped from runs/train-seg/exp3/weights/last.pt, 15.2MB Optimizer stripped from runs/train-seg/exp3/weights/best.pt, 15.2MB
这个示例代码只标注了173图片,而且训练集与测试集是一样的,所以精度会比较高;
大家在实际的实验或项目中,需要把数据划分为训练集、验证集,数量也会比较大。这里能理解这示例流程就可以啦。
四、模型推理
模型推理,主要是用刚才训练好的模型,去预测新的图片,看看效果如何。
实例分割的代码是在 segment predict.py中,用刚才训练好的权重:runs/train-seg/exp3/weights/best.pt去预测
python segment/predict.py --weights runs/train-seg/exp3/weights/best.pt
也可以推理实时摄像头的数据:
python segment/predict.py --weights runs/train-seg/exp3/weights/best.pt --source 0
下面是yolov5实例分割,在其它数据集的效果:
分享完成啦~ ( ?? ω ?? )?
本文只供大家参考与学习,谢谢~
后面分享一下yolov5实例分割的思路和原理,以及代码讲解~