python3 ubuntu16.04 caffe faster_rcnn_end2end训练kitti数据集 (也采用VOC2007训练了50000次过)
https://blog.csdn.net/Asunany/article/details/79176935: 博主我也遇到了这个问题,但是我的问题的原因是因为我的anntotations里面的标注和图片不一致。简单来说就是本来标注的时a图片的内容,但是我的xml文件名是b图片的名字。所以基本没有结果。另外如果真的是标注错了的问题的话,你训练的时候观察一下他的损失函数,会很剧烈的震荡的。 测试方案: (之前采用完整的voc2007也是这个问题)对比生成时候的脚本和标签的内容 测试结果: 生成脚本与https://blog.csdn.net/mdjxy63/article/details/79821516对比,脚本几乎一致
I got the same problem. Finally, I found it was because my data and label were not loaded properly. My data file is modified from pascal_voc.py, “cls = self._class_to_ind[obj.find(‘name’).text.lower().strip()” makes the label being matched in lower case. I set the labels in “self._classes” in lower case but my labels in the annotation file are in upper case. In the evaluation stage, it cannot be matched. I solve this problem by setting “self._classes” in upper case and change “cls = self._class_to_ind[obj.find(‘name’).text.lower().strip()” to “cls = self._class_to_ind[obj.find(‘name’).text.strip()”. (https://github.com/endernewton/tf-faster-rcnn/issues/221) 测试方案:修改类别,删除lower函数 出现上述情况看一下自己pascal_voc.py文件中class和self._class_to_ind[obj.find(‘name’).text.lower().strip()还有标注的类别名称大小写是不是一致 https://blog.csdn.net/weixin_43981560/article/details/105124130?utm_medium=distribute.pc_relevant.none-task-blog-baidujs-1 结果:仍然为0
测试方案:先对voc2007训练了35000次,再对kitti训练10000次看看 结果:35000次训练,loss一直在振荡? 振荡相关: 1.训练集有问题:https://haoyu.love/blog404.html 2.无结果讨论贴:https://bbs.csdn.net/topics/392270182 https://bbs.csdn.net/topics/392748958
https://github.com/search?o=desc&q=Mean+AP+%3D+0.0000&s=&type=Issues https://github.com/ShuangXieIrene/ssds.pytorch/issues/30
经过一番计算数据的排查分析,将py-faster-rcnn/lib/datasets/voc_eval.py里的//改为/,成功解决:
AP for tram = 0.5701 Mean AP = 0.5440 ~~~~~~~~ Results: 0.672 0.540 0.494 0.390 0.647 0.704 0.333 0.570 0.544 ~~~~~~~~延续上一个文章的解决思路,也是和林老师学的方法:把计算的数据分析出来,要说明现在输出是什么,与实际有什么矛盾,然后分析具体矛盾时什么,是什么原因造成的。经过初步排查也就是查看在哪个文件来计算meanAP,认为是py-faster-rcnn/lib/datasets/voc_eval.py里有问题。然后查看各个地方传入的参数有无问题,发现是求除法的时候,因为之前采用python2改python3来跑,所以“除”这个符号顺手改了导致后面计算的阀值不到0.5。按照python3的做法,原本计算时候采用的//保留的是整除后的结果,而想要的是float类型的结果,也验证了在调试查看参数时候confidence有值也有排序的结果,但overlaps = inters / uni里overlaps值很小。
主体注释来源https://blog.csdn.net/hongxingabc/article/details/80090736?utm_source=blogxgwz2 加上自己调试的一些备注
# -------------------------------------------------------- # Fast/er R-CNN # Licensed under The MIT License [see LICENSE for details] # Written by Bharath Hariharan # -------------------------------------------------------- import xml.etree.ElementTree as ET import os import pickle import numpy as np def parse_rec(filename): """ Parse a PASCAL VOC xml file """ tree = ET.parse(filename) objects = [] for obj in tree.findall('object'): obj_struct = {} obj_struct['name'] = obj.find('name').text # obj_struct['pose'] = obj.find('pose').text # obj_struct['truncated'] = int(obj.find('truncated').text) obj_struct['difficult'] = int(obj.find('difficult').text) # bbox是原数据标记的 bbox = obj.find('bndbox') obj_struct['bbox'] = [int(bbox.find('xmin').text), int(bbox.find('ymin').text), int(bbox.find('xmax').text), int(bbox.find('ymax').text)] objects.append(obj_struct) return objects def voc_ap(rec, prec, use_07_metric=False): """ ap = voc_ap(rec, prec, [use_07_metric]) Compute VOC AP given precision and recall. If use_07_metric is true, uses the VOC 07 11 point method (default:False). """ if use_07_metric: # 11 point metric ap = 0. for t in np.arange(0., 1.1, 0.1): if np.sum(rec >= t) == 0: p = 0 else: p = np.max(prec[rec >= t]) ap = ap + p / 11. #0704ap = ap + p // 11. else: # correct AP calculation # first append sentinel values at the end mrec = np.concatenate(([0.], rec, [1.])) mpre = np.concatenate(([0.], prec, [0.])) # compute the precision envelope for i in range(mpre.size - 1, 0, -1): mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) # to calculate area under PR curve, look for points # where X axis (recall) changes value i = np.where(mrec[1:] != mrec[:-1])[0] # and sum (\Delta recall) * prec ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) return ap def voc_eval(detpath, annopath, imagesetfile, classname, cachedir, ovthresh=0.5, use_07_metric=False): """rec, prec, ap = voc_eval(detpath, annopath, #xml 标注文件。 imagesetfile,数据集划分txt文件,路径VOCdevkit/VOC20xx/ImageSets/Main/test.txt 这里假设测试图像1000张,那么该txt文件1000行。 classname, [ovthresh],#重叠的多少大小。 [use_07_metric]) Top level function that does the PASCAL VOC evaluation. detpath: Path to detections detpath.format(classname) should produce the detection results file. annopath: Path to annotations annopath.format(imagename) should be the xml annotations file. imagesetfile: Text file containing the list of images, one image per line. classname: Category name (duh) cachedir: Directory for caching the annotations#缓存标注的目录路径VOCdevkit/annotation_cache, 图像数据只读文件,为了避免每次都要重新读数据集原始数据。 [ovthresh]: Overlap threshold (default = 0.5) [use_07_metric]: Whether to use VOC07's 11 point AP computation (default False) """ # assumes detections are in detpath.format(classname) # assumes annotations are in annopath.format(imagename) # assumes imagesetfile is a text file with each line an image name # cachedir caches the annotations in a pickle file # first load gt # first load gt 加载ground truth。 if not os.path.isdir(cachedir): os.mkdir(cachedir) #只读文件名称。 cachefile = os.path.join(cachedir, 'annots.pkl') # read list of images #0630 with open(imagesetfile, 'rb') as f: #读取所有待检测图片名 with open(imagesetfile, 'r') as f: lines = f.readlines() #待检测图像文件名字存于数组imagenames,长度1000。 imagenames = [x.strip() for x in lines] #如果只读文件不存在,则只好从原始数据集中重新加载数据 if not os.path.isfile(cachefile): # load annots recs = {} for i, imagename in enumerate(imagenames): recs[imagename] = parse_rec(annopath.format(imagename)) if i % 100 == 0: #进度条 print ('Reading annotation for {:d}/{:d}'.format( i + 1, len(imagenames))) # save print ('Saving cached annotations to {:s}'.format(cachefile)) #22 with open(cachefile, 'w') as f: with open(cachefile, 'wb') as f: #recs字典c保存到只读文件。 pickle.dump(recs, f) else: # load #24 with open(cachefile, 'r') as f: #如果已经有了只读文件,加载到recs。 with open(cachefile, 'rb') as f: recs = pickle.load(f) # extract gt objects for this class #按类别获取标注文件,recall和precision都是针对不同类别而言的,AP也是对各个类别分别算的。 #当前类别的标注 class_recs = {} #npos标记的目标数量 npos = 0 for imagename in imagenames: #过滤,只保留recs中指定类别的项,存为R。 R = [obj for obj in recs[imagename] if obj['name'] == classname] #抽取bbox bbox = np.array([x['bbox'] for x in R]) #如果数据集没有difficult,所有项都是0. difficult = np.array([x['difficult'] for x in R]).astype(np.bool) #len(R)就是当前类别的gt目标个数,det表示是否检测到,初始化为false。 det = [False] * len(R) #自增,非difficult样本数量,如果数据集没有difficult,npos数量就是gt数量。 npos = npos + sum(~difficult) class_recs[imagename] = {'bbox': bbox, 'difficult': difficult, 'det': det} # read dets detfile = detpath.format(classname) #0630 with open(detfile, 'rb') as f: with open(detfile, 'r') as f: lines = f.readlines() #假设检测结果有20000个,则splitlines长度20000 splitlines = [x.strip().split(' ') for x in lines] #检测结果中的图像名,image_ids长度20000,但实际图像只有1000张,因为一张图像上可以有多个目标检测结果 image_ids = [x[0] for x in splitlines] #检测结果置信度 confidence = np.array([float(x[1]) for x in splitlines]) print('test0630 confidence=', confidence) #变为浮点型的bbox。 BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) # print('test0704 before sorted_ind BB=', BB) # sort by confidence 将20000个检测结果按置信度排序 # 对confidence的index根据值大小进行降序排列。 sorted_ind = np.argsort(-confidence) print('sorted_ind =', sorted_ind) sorted_ind_test = np.argsort(confidence) print('sorted_ind_test =', sorted_ind_test) #降序排列。 sorted_scores = np.sort(-confidence) print('sorted_scores =', sorted_scores) #23 BB = BB[sorted_ind, :] 49 实际上是按照筛选出来的框进行排序 BB_test = BB[sorted_ind_test, :] print('test0704 sorted_ind_test BB_test=', BB_test) BB = BB[sorted_ind, :] print('test0704 sorted_ind BB=', BB) #重排bbox,由大概率到小概率。 # if len(BB) != 0: # BB = BB[sorted_ind, :] # print('test0630 BB=',BB) # image_id是刚才抽出来的那些图片 image_ids = [image_ids[x] for x in sorted_ind] # go down dets and mark TPs and FPs #注意这里是20000,不是1000 nd = len(image_ids) print('test0630 nd=', nd) # true positive,长度20000 tp = np.zeros(nd) # false positive,长度20000 fp = np.zeros(nd) #遍历所有检测结果,因为已经排序,所以这里是从置信度最高到最低遍历 for d in range(nd): #当前检测结果所在图像的所有同类别gt R = class_recs[image_ids[d]] #当前检测结果bbox坐标 bb = BB[d, :].astype(float) ovmax = -np.inf #当前检测结果所在图像的所有同类别gt的bbox坐标 BBGT = R['bbox'].astype(float) if BBGT.size > 0: # compute overlaps # 计算当前检测结果,与该检测结果所在图像的标注重合率,一对多用到python的broadcast机制(对应数相乘) # intersection ixmin = np.maximum(BBGT[:, 0], bb[0]) iymin = np.maximum(BBGT[:, 1], bb[1]) ixmax = np.minimum(BBGT[:, 2], bb[2]) iymax = np.minimum(BBGT[:, 3], bb[3]) iw = np.maximum(ixmax - ixmin + 1., 0.) # print('test0630 iw=', iw) ih = np.maximum(iymax - iymin + 1., 0.) # print('test0630 ih=', ih) inters = iw * ih print('test0630 inters=', inters) # union uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) + (BBGT[:, 2] - BBGT[:, 0] + 1.) * (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters) print('test0630 uni=', uni) overlaps = inters / uni print('test0704 overlaps = inters / uni =', overlaps) #overlaps = inters / uni #最大重合率 ovmax = np.max(overlaps) print('test0704 ovthresh=0.5 ovmax=', ovmax) #最大重合率对应的gt jmax = np.argmax(overlaps) print('test0630 ovthresh=0.5 out if ovmax=', ovmax) #print('test0630 ovthresh=', ovthresh) #如果当前检测结果与真实标注最大重合率满足阈值 if ovmax > ovthresh: if not R['difficult'][jmax]: if not R['det'][jmax]: #正检数目+1 tp[d] = 1. #该gt被置为已检测到,下一次若还有另一个检测结果与之重合率满足阈值,则不能认为多检测到一个目标 R['det'][jmax] = 1 else: #相反,认为检测到一个虚警 fp[d] = 1. else: #不满足阈值,肯定是虚警 fp[d] = 1. # compute precision recall #积分图,在当前节点前的虚警数量,fp长度 fp = np.cumsum(fp) #积分图,在当前节点前的正检数量 tp = np.cumsum(tp) print('test0630') print('rec = tp / float(npos), tp=', tp) print('prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)', fp) #召回率,长度20000,从0到1 rec = tp / float(npos) #rec = tp // float(npos) # avoid divide by zero in case the first detection matches a difficult # ground truth 准确率,长度20000,长度20000,从1到0 prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) #prec = tp // np.maximum(tp + fp, np.finfo(np.float64).eps) print('test0630') print('rec=', rec) print('prec=', prec) print('use_07_metric', use_07_metric) ap = voc_ap(rec, prec, use_07_metric) return rec, prec, aploss不收敛问题仍存在,而且训练60000W次的meanAP结果并不好,调试了学习率和训练次数,但没调batchsize,后面需要进一步调试。
