py-faster-rcnn AP = 0 Mean AP = 0

技术2025-11-05 28

python3下，py-faster-rcnn训练自己数据集时，检测的AP 和 mean ap =0

一、实验环境二、问题三、问题分析问题一：考虑标记错误(修改后没有解决问题)问题二：源码需要针对kitti进行修改(修改后没有解决问题)问题三：训练的次数不够，不能检测出来(修改后没有解决问题)附Git查找：Mean AP = 0.0000（无结果）问题四：进行召回率计算时，参数计算错误(修改后成功解决问题) 四、解决思路五、计算召回率文件代码voc_eval.py六、展望

一、实验环境

python3 ubuntu16.04 caffe faster_rcnn_end2end训练kitti数据集（也采用VOC2007训练了50000次过）

二、问题

im_detect: 748/749 0.185s 0.001s im_detect: 749/749 0.185s 0.001s Evaluating detections Writing truck VOC results file Writing cyclist VOC results file Writing pedestrian VOC results file Writing misc VOC results file Writing van VOC results file Writing car VOC results file Writing person_sitting VOC results file Writing tram VOC results file VOC07 metric? Yes 0 __background__ 1 truck filenameoutput /home/hadoop/workspace/RPN/py-faster-rcnn/data/VOCdevkit2007/results/VOC2007/Main/comp4_1112af57-15de-4b18-b22f-98006b4ca01e_det_test_truck.txt Reading annotation for 1/749 Reading annotation for 101/749 Reading annotation for 201/749 Reading annotation for 301/749 Reading annotation for 401/749 Reading annotation for 501/749 Reading annotation for 601/749 Reading annotation for 701/749 Saving cached annotations to /home/hadoop/workspace/RPN/py-faster-rcnn/data/VOCdevkit2007/annotations_cache/annots.pkl AP for truck = 0.0000 AP for cyclist = 0.0000 AP for pedestrian = 0.0000 AP for misc = 0.0000 AP for van = 0.0000 AP for car = 0.0000 AP for person_sitting = 0.0000 AP for tram = 0.0000 Mean AP = 0.0000 ~~~~~~~~ Results: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ~~~~~~~~ -------------------------------------------------------------- Results computed with the **unofficial** Python eval code. Results should be very close to the official MATLAB eval code. Recompute with `./tools/reval.py --matlab ...` for your paper. -- Thanks, The Management -------------------------------------------------------------- real 2m31.064s user 1m28.034s sys 1m10.336s

三、问题分析

问题一：考虑标记错误(修改后没有解决问题)

https://blog.csdn.net/Asunany/article/details/79176935: 博主我也遇到了这个问题，但是我的问题的原因是因为我的anntotations里面的标注和图片不一致。简单来说就是本来标注的时a图片的内容，但是我的xml文件名是b图片的名字。所以基本没有结果。另外如果真的是标注错了的问题的话，你训练的时候观察一下他的损失函数，会很剧烈的震荡的。测试方案：（之前采用完整的voc2007也是这个问题）对比生成时候的脚本和标签的内容测试结果：生成脚本与https://blog.csdn.net/mdjxy63/article/details/79821516对比，脚本几乎一致

问题二：源码需要针对kitti进行修改(修改后没有解决问题)

I got the same problem. Finally, I found it was because my data and label were not loaded properly. My data file is modified from pascal_voc.py, “cls = self._class_to_ind[obj.find(‘name’).text.lower().strip()” makes the label being matched in lower case. I set the labels in “self._classes” in lower case but my labels in the annotation file are in upper case. In the evaluation stage, it cannot be matched. I solve this problem by setting “self._classes” in upper case and change “cls = self._class_to_ind[obj.find(‘name’).text.lower().strip()” to “cls = self._class_to_ind[obj.find(‘name’).text.strip()”. （https://github.com/endernewton/tf-faster-rcnn/issues/221）测试方案：修改类别，删除lower函数出现上述情况看一下自己pascal_voc.py文件中class和self._class_to_ind[obj.find(‘name’).text.lower().strip()还有标注的类别名称大小写是不是一致 https://blog.csdn.net/weixin_43981560/article/details/105124130?utm_medium=distribute.pc_relevant.none-task-blog-baidujs-1 结果：仍然为0

问题三：训练的次数不够，不能检测出来(修改后没有解决问题)

测试方案：先对voc2007训练了35000次，再对kitti训练10000次看看结果：35000次训练，loss一直在振荡？振荡相关： 1.训练集有问题：https://haoyu.love/blog404.html 2.无结果讨论贴：https://bbs.csdn.net/topics/392270182 https://bbs.csdn.net/topics/392748958

附Git查找：Mean AP = 0.0000（无结果）

https://github.com/search?o=desc&q=Mean+AP+%3D+0.0000&s=&type=Issues https://github.com/ShuangXieIrene/ssds.pytorch/issues/30

问题四：进行召回率计算时，参数计算错误(修改后成功解决问题)

经过一番计算数据的排查分析，将py-faster-rcnn/lib/datasets/voc_eval.py里的//改为/，成功解决：

AP for tram = 0.5701 Mean AP = 0.5440 ~~~~~~~~ Results: 0.672 0.540 0.494 0.390 0.647 0.704 0.333 0.570 0.544 ~~~~~~~~

四、解决思路

延续上一个文章的解决思路，也是和林老师学的方法：把计算的数据分析出来，要说明现在输出是什么，与实际有什么矛盾，然后分析具体矛盾时什么，是什么原因造成的。经过初步排查也就是查看在哪个文件来计算meanAP，认为是py-faster-rcnn/lib/datasets/voc_eval.py里有问题。然后查看各个地方传入的参数有无问题，发现是求除法的时候，因为之前采用python2改python3来跑，所以“除”这个符号顺手改了导致后面计算的阀值不到0.5。按照python3的做法,原本计算时候采用的//保留的是整除后的结果，而想要的是float类型的结果，也验证了在调试查看参数时候confidence有值也有排序的结果，但overlaps = inters / uni里overlaps值很小。

五、计算召回率文件代码voc_eval.py

主体注释来源https://blog.csdn.net/hongxingabc/article/details/80090736?utm_source=blogxgwz2 加上自己调试的一些备注

# -------------------------------------------------------- # Fast/er R-CNN # Licensed under The MIT License [see LICENSE for details] # Written by Bharath Hariharan # -------------------------------------------------------- import xml.etree.ElementTree as ET import os import pickle import numpy as np def parse_rec(filename): """ Parse a PASCAL VOC xml file """ tree = ET.parse(filename) objects = [] for obj in tree.findall('object'): obj_struct = {} obj_struct['name'] = obj.find('name').text # obj_struct['pose'] = obj.find('pose').text # obj_struct['truncated'] = int(obj.find('truncated').text) obj_struct['difficult'] = int(obj.find('difficult').text) # bbox是原数据标记的 bbox = obj.find('bndbox') obj_struct['bbox'] = [int(bbox.find('xmin').text), int(bbox.find('ymin').text), int(bbox.find('xmax').text), int(bbox.find('ymax').text)] objects.append(obj_struct) return objects def voc_ap(rec, prec, use_07_metric=False): """ ap = voc_ap(rec, prec, [use_07_metric]) Compute VOC AP given precision and recall. If use_07_metric is true, uses the VOC 07 11 point method (default:False). """ if use_07_metric: # 11 point metric ap = 0. for t in np.arange(0., 1.1, 0.1): if np.sum(rec >= t) == 0: p = 0 else: p = np.max(prec[rec >= t]) ap = ap + p / 11. #0704ap = ap + p // 11. else: # correct AP calculation # first append sentinel values at the end mrec = np.concatenate(([0.], rec, [1.])) mpre = np.concatenate(([0.], prec, [0.])) # compute the precision envelope for i in range(mpre.size - 1, 0, -1): mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i]) # to calculate area under PR curve, look for points # where X axis (recall) changes value i = np.where(mrec[1:] != mrec[:-1])[0] # and sum (\Delta recall) * prec ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1]) return ap def voc_eval(detpath, annopath, imagesetfile, classname, cachedir, ovthresh=0.5, use_07_metric=False): """rec, prec, ap = voc_eval(detpath, annopath, #xml 标注文件。 imagesetfile,数据集划分txt文件，路径VOCdevkit/VOC20xx/ImageSets/Main/test.txt 这里假设测试图像1000张，那么该txt文件1000行。 classname, [ovthresh],#重叠的多少大小。 [use_07_metric]) Top level function that does the PASCAL VOC evaluation. detpath: Path to detections detpath.format(classname) should produce the detection results file. annopath: Path to annotations annopath.format(imagename) should be the xml annotations file. imagesetfile: Text file containing the list of images, one image per line. classname: Category name (duh) cachedir: Directory for caching the annotations#缓存标注的目录路径VOCdevkit/annotation_cache, 图像数据只读文件，为了避免每次都要重新读数据集原始数据。 [ovthresh]: Overlap threshold (default = 0.5) [use_07_metric]: Whether to use VOC07's 11 point AP computation (default False) """ # assumes detections are in detpath.format(classname) # assumes annotations are in annopath.format(imagename) # assumes imagesetfile is a text file with each line an image name # cachedir caches the annotations in a pickle file # first load gt # first load gt 加载ground truth。 if not os.path.isdir(cachedir): os.mkdir(cachedir) #只读文件名称。 cachefile = os.path.join(cachedir, 'annots.pkl') # read list of images #0630 with open(imagesetfile, 'rb') as f: #读取所有待检测图片名 with open(imagesetfile, 'r') as f: lines = f.readlines() #待检测图像文件名字存于数组imagenames,长度1000。 imagenames = [x.strip() for x in lines] #如果只读文件不存在，则只好从原始数据集中重新加载数据 if not os.path.isfile(cachefile): # load annots recs = {} for i, imagename in enumerate(imagenames): recs[imagename] = parse_rec(annopath.format(imagename)) if i % 100 == 0: #进度条 print ('Reading annotation for {:d}/{:d}'.format( i + 1, len(imagenames))) # save print ('Saving cached annotations to {:s}'.format(cachefile)) #22 with open(cachefile, 'w') as f: with open(cachefile, 'wb') as f: #recs字典c保存到只读文件。 pickle.dump(recs, f) else: # load #24 with open(cachefile, 'r') as f: #如果已经有了只读文件，加载到recs。 with open(cachefile, 'rb') as f: recs = pickle.load(f) # extract gt objects for this class #按类别获取标注文件，recall和precision都是针对不同类别而言的，AP也是对各个类别分别算的。 #当前类别的标注 class_recs = {} #npos标记的目标数量 npos = 0 for imagename in imagenames: #过滤，只保留recs中指定类别的项，存为R。 R = [obj for obj in recs[imagename] if obj['name'] == classname] #抽取bbox bbox = np.array([x['bbox'] for x in R]) #如果数据集没有difficult,所有项都是0. difficult = np.array([x['difficult'] for x in R]).astype(np.bool) #len(R)就是当前类别的gt目标个数，det表示是否检测到，初始化为false。 det = [False] * len(R) #自增，非difficult样本数量，如果数据集没有difficult，npos数量就是gt数量。 npos = npos + sum(~difficult) class_recs[imagename] = {'bbox': bbox, 'difficult': difficult, 'det': det} # read dets detfile = detpath.format(classname) #0630 with open(detfile, 'rb') as f: with open(detfile, 'r') as f: lines = f.readlines() #假设检测结果有20000个，则splitlines长度20000 splitlines = [x.strip().split(' ') for x in lines] #检测结果中的图像名，image_ids长度20000，但实际图像只有1000张，因为一张图像上可以有多个目标检测结果 image_ids = [x[0] for x in splitlines] #检测结果置信度 confidence = np.array([float(x[1]) for x in splitlines]) print('test0630 confidence=', confidence) #变为浮点型的bbox。 BB = np.array([[float(z) for z in x[2:]] for x in splitlines]) # print('test0704 before sorted_ind BB=', BB) # sort by confidence 将20000个检测结果按置信度排序 # 对confidence的index根据值大小进行降序排列。 sorted_ind = np.argsort(-confidence) print('sorted_ind =', sorted_ind) sorted_ind_test = np.argsort(confidence) print('sorted_ind_test =', sorted_ind_test) #降序排列。 sorted_scores = np.sort(-confidence) print('sorted_scores =', sorted_scores) #23 BB = BB[sorted_ind, :] 49 实际上是按照筛选出来的框进行排序 BB_test = BB[sorted_ind_test, :] print('test0704 sorted_ind_test BB_test=', BB_test) BB = BB[sorted_ind, :] print('test0704 sorted_ind BB=', BB) #重排bbox，由大概率到小概率。 # if len(BB) != 0: # BB = BB[sorted_ind, :] # print('test0630 BB=',BB) # image_id是刚才抽出来的那些图片 image_ids = [image_ids[x] for x in sorted_ind] # go down dets and mark TPs and FPs #注意这里是20000，不是1000 nd = len(image_ids) print('test0630 nd=', nd) # true positive，长度20000 tp = np.zeros(nd) # false positive，长度20000 fp = np.zeros(nd) #遍历所有检测结果，因为已经排序，所以这里是从置信度最高到最低遍历 for d in range(nd): #当前检测结果所在图像的所有同类别gt R = class_recs[image_ids[d]] #当前检测结果bbox坐标 bb = BB[d, :].astype(float) ovmax = -np.inf #当前检测结果所在图像的所有同类别gt的bbox坐标 BBGT = R['bbox'].astype(float) if BBGT.size > 0: # compute overlaps # 计算当前检测结果，与该检测结果所在图像的标注重合率，一对多用到python的broadcast机制(对应数相乘) # intersection ixmin = np.maximum(BBGT[:, 0], bb[0]) iymin = np.maximum(BBGT[:, 1], bb[1]) ixmax = np.minimum(BBGT[:, 2], bb[2]) iymax = np.minimum(BBGT[:, 3], bb[3]) iw = np.maximum(ixmax - ixmin + 1., 0.) # print('test0630 iw=', iw) ih = np.maximum(iymax - iymin + 1., 0.) # print('test0630 ih=', ih) inters = iw * ih print('test0630 inters=', inters) # union uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) + (BBGT[:, 2] - BBGT[:, 0] + 1.) * (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters) print('test0630 uni=', uni) overlaps = inters / uni print('test0704 overlaps = inters / uni =', overlaps) #overlaps = inters / uni #最大重合率 ovmax = np.max(overlaps) print('test0704 ovthresh=0.5 ovmax=', ovmax) #最大重合率对应的gt jmax = np.argmax(overlaps) print('test0630 ovthresh=0.5 out if ovmax=', ovmax) #print('test0630 ovthresh=', ovthresh) #如果当前检测结果与真实标注最大重合率满足阈值 if ovmax > ovthresh: if not R['difficult'][jmax]: if not R['det'][jmax]: #正检数目+1 tp[d] = 1. #该gt被置为已检测到，下一次若还有另一个检测结果与之重合率满足阈值，则不能认为多检测到一个目标 R['det'][jmax] = 1 else: #相反，认为检测到一个虚警 fp[d] = 1. else: #不满足阈值，肯定是虚警 fp[d] = 1. # compute precision recall #积分图，在当前节点前的虚警数量，fp长度 fp = np.cumsum(fp) #积分图，在当前节点前的正检数量 tp = np.cumsum(tp) print('test0630') print('rec = tp / float(npos), tp=', tp) print('prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)', fp) #召回率，长度20000，从0到1 rec = tp / float(npos) #rec = tp // float(npos) # avoid divide by zero in case the first detection matches a difficult # ground truth 准确率，长度20000，长度20000，从1到0 prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps) #prec = tp // np.maximum(tp + fp, np.finfo(np.float64).eps) print('test0630') print('rec=', rec) print('prec=', prec) print('use_07_metric', use_07_metric) ap = voc_ap(rec, prec, use_07_metric) return rec, prec, ap

六、展望

loss不收敛问题仍存在，而且训练60000W次的meanAP结果并不好，调试了学习率和训练次数，但没调batchsize，后面需要进一步调试。

Processed: 0.017, SQL: 9