首页 > 分享 > caffe训练resnet50分类宠物狗

caffe训练resnet50分类宠物狗

萌宠菠菠乐园
2024-08-17 05:59

训练环境

硬件 GTX3090内存：32GB 软件驱动：460.56CUDA：V11.1.105CUDNN：8.0.5OpenCV：4.5.2-pre（训练caffe不需要编译对opencv支持）操作系统：manjaro

Caffe配置文件：

## Refer to http://caffe.berkeleyvision.org/installation.html # Contributions simplifying and improving our build system are welcome! # cuDNN acceleration switch (uncomment to build with cuDNN). USE_CUDNN := 1 # CPU-only switch (uncomment to build without GPU support). # CPU_ONLY := 1 # uncomment to disable IO dependencies and corresponding data layers USE_OPENCV := 1 USE_LEVELDB := 1 USE_LMDB := 1 # uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary) # You should not set this flag if you will be reading LMDBs with any # possibility of simultaneous read and write # ALLOW_LMDB_NOLOCK := 1 # Uncomment if you're using OpenCV 3 or 4 OPENCV_VERSION := 4 USE_PKG_CONFIG := 1 # To customize your choice of compiler, uncomment and set the following. # N.B. the default for Linux is g++ and the default for OSX is clang++ # CUSTOM_CXX := g++ # CUDA directory contains bin/ and lib/ directories that we need. CUDA_DIR := /usr/local/cuda # CUDA architecture setting: going with all of them. # For CUDA < 6.0, comment the lines after *_35 for compatibility. CUDA_ARCH := -gencode arch=compute_80,code=sm_80 # BLAS choice: # atlas for ATLAS (default) # mkl for MKL # open for OpenBlas # BLAS := atlas BLAS := open # Custom (MKL/ATLAS/OpenBLAS) include and lib directories. # Leave commented to accept the defaults for your choice of BLAS # (which should work)! BLAS_INCLUDE := /opt/OpenBLAS/include BLAS_LIB := /opt/OpenBLAS/lib # Homebrew puts openblas in a directory that is not on the standard search path # BLAS_INCLUDE := $(shell brew --prefix openblas)/include # BLAS_LIB := $(shell brew --prefix openblas)/lib # This is required only if you will compile the matlab interface. # NOTE: this is required only if you will compile the python interface. # We need to be able to find Python.h and numpy/arrayobject.h. # PYTHON_INCLUDE := /usr/include/python2.7 # /usr/lib/python2.7/dist-packages/numpy/core/include # Anaconda Python distribution is quite popular. Include path: # Verify anaconda location, sometimes it's in root. ANACONDA_HOME := $(HOME)/miniconda3 PYTHON_INCLUDE := $(ANACONDA_HOME)/include $(ANACONDA_HOME)/include/python3.8 $(ANACONDA_HOME)/lib/python3.8/site-packages/numpy/core/include # Uncomment to use Python 3 (default is Python 2) PYTHON_LIBRARIES := boost_python38 python3.8 # PYTHON_INCLUDE := /usr/include/python3.8 # /usr/lib/python3/dist-packages/numpy/core/include # We need to be able to find libpythonX.X.so or .dylib. # PYTHON_LIB := /usr/lib PYTHON_LIB := $(ANACONDA_HOME)/lib # Homebrew installs numpy in a non standard path (keg only) # PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.__file__)'))/include # PYTHON_LIB += $(shell brew --prefix numpy)/lib # Uncomment to support layers written in Python (will link against Python libs) WITH_PYTHON_LAYER := 1 # Whatever else you find you need goes here. INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/local/include/opencv4 /usr/include/hdf5/serial LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial # If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies # INCLUDE_DIRS += $(shell brew --prefix)/include # LIBRARY_DIRS += $(shell brew --prefix)/lib # N.B. both build and distribute dirs are cleared on `make clean` BUILD_DIR := build DISTRIBUTE_DIR := distribute # Uncomment for debugging. Does not work on OSX due to https://github.com/BVLC/caffe/issues/171 DEBUG := 1 # The ID of the GPU that 'make runtest' will use to run unit tests. TEST_GPUID := 0 # enable pretty build (comment to see full commands) Q ?= @

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101

这里需要注意:如果你编译opencv支持，你需要设置opencv4,这里不是随便写的。如果你是源码编译通常不会生成一个叫做opencv.pc的文件，你需要手动创建否则无法加载opencv动态库，编译caffe的时候就会出现libopencvxxx未定义的引用。如果你开启opencv使用的和我一样最新的opencv4,你需要手动创建 /usr/local/lib/pkgconfig/opencv4.pc。

我的opencv4.pc内容如下：

prefix=/usr/local exec_prefix=${prefix} includedir=${prefix}/include libdir=${exec_prefix}/lib Name: opencv Description: The opencv library Version:4.5.2-pre Cflags: -I${includedir}/opencv4 -I${includedir}/opencv4/opencv2 Libs: -L${libdir} -lopencv_bgsegm -lopencv_bioinspired -lopencv_calib3d -lopencv_core -lopencv_cudaarithm -lopencv_cudabgsegm -lopencv_cudacodec -lopencv_cudafeatures2d -lopencv_cudafilters -lopencv_cudaimgproc -lopencv_cudalegacy -lopencv_cudaobjdetect -lopencv_cudaoptflow -lopencv_cudastereo -lopencv_cudawarping -lopencv_cudev -lopencv_datasets -lopencv_dnn_objdetect -lopencv_dnn -lopencv_dnn_superres -lopencv_dpm -lopencv_face -lopencv_features2d -lopencv_flann -lopencv_freetype -lopencv_gapi -lopencv_hdf -lopencv_highgui -lopencv_imgcodecs -lopencv_imgproc -lopencv_intensity_transform -lopencv_mcc -lopencv_ml -lopencv_objdetect -lopencv_optflow -lopencv_photo -lopencv_plot -lopencv_quality -lopencv_rapid -lopencv_reg -lopencv_rgbd -lopencv_saliency -lopencv_sfm -lopencv_shape -lopencv_stereo -lopencv_stitching -lopencv_superres -lopencv_text -lopencv_tracking -lopencv_videoio -lopencv_video -lopencv_videostab -lopencv_world -lopencv_xfeatures2d -lopencv_ximgproc 12345678910

这里同样需要注意的是：我的opencv几乎编译了对一切的支持，所以我的动态库会比较多，如果你没有编译所有的动态库，你需要填入你自己的动态库，ls /libopencv*之类的操作一下就好了，否则同样会出现未定义的引用。

如果你和我一样使用manjaro这个优秀的一笔的os，你需要注意另一点。以上配置保证你能正确编译caffe，但是不能保证caffe能正确调用库，这时直接运行caffe通常会出现libopencvxxx没有找到，原因是你的opencv编译的库不是在标准库路径下/usr/lib。你可以通过export LD_LIBRARY_PATH=’/usr/local/lib’{LD_LIBRARY_PATH}临时添加一下，或者加入到ld.config.d里面的caffe.conf(自己创建)。这里需要非常注意的是：顺序，查找库的顺序必须保证最新的在最前面，如果你需要libpython3.8然而你用的是anaconda的lib库，放在最前面了，会导致你的系统默认使用anaconda的老库，在manjaro上就体现为sddm启动不了，关机之后无法进入桌面系统。所以保证anaconda指定的库路径放在最后一行。

数据集使用的宠物狗数据集，这个数据集可以用来做分割、检测、分类数据规模更大一些，后续将作为我的benchmark。数据集有点大，我就不提供下载链接了，上传百度网盘还没有你自己从国外下载来的快。需要注意的是pet数据集里面有几张图片可能是编码问题会出现读取不了的情况。所以拿到数据之后建议你先读取所有的图片检查空图片（为空应该是png编码但是后缀又是jpg解析就未空了），我没有验证，直接给删了，否则darknet或者mmsegmentation报错，debug了好久，我一直以为是我mmseg安装有问题，fk。

预处理脚本

import numpy as np import glob import shutil from os.path import join, basename, dirname, exists import os import json import shlex import subprocess import caffe_pb2 import time from google.protobuf import text_format import argparse import json def datapreprocess(dataset_path, output_path, rate=0.8): images = glob.glob("{}/*.jpg".format(dataset_path)) class_names = set() dataset_dict = {} def image_to_name(x): return "_".join(basename(x).split('_')[:-1]) for image in images: class_name = image_to_name(image) class_names.add(class_name) if class_name not in dataset_dict: dataset_dict[class_name] = [image] else: dataset_dict[class_name].append(image) if not exists(output_path): os.makedirs(output_path) data = sorted(list(class_names)) class_to_num = {class_name: num for num, class_name in enumerate(data)} with open(join(output_path, 'label.json'), 'w') as f: json.dump(fp=f, obj=class_to_num) label_hander = {phase: open( join(output_path, '{}.txt'.format(phase)), 'w') for phase in ['train', 'val']} for key in dataset_dict: images = dataset_dict[key] num_train = int(len(images)*rate) split_image = {'train': images[:num_train], 'val': images[num_train:]} for phase in ['train', 'val']: output_path_tmp = join(output_path, phase) if not exists(output_path_tmp): os.makedirs(output_path_tmp) [shutil.copy(image, output_path_tmp) for image in split_image[phase]] [label_hander[phase].write("{} {}n".format(basename( image), class_to_num[image_to_name(image)])) for image in split_image[phase]] [label_hander[key].close() for key in label_hander] result = {'train_image_label': join( output_path, 'train.txt'), 'val_image_label': join(output_path, 'val.txt'), 'label_to_num': join(output_path, 'label.json'), 'train_dataset': join(output_path, 'train'), 'val_dataset': join(output_path, 'val')} return result def findfile(start, name): res = None for relpath, dirs, files in os.walk(start): if name in files: full_path = os.path.join(start, relpath, name) res = os.path.normpath(os.path.abspath(full_path)) return res def convert_dataset(dataset_path, dataset_store, shape=(227, 227)): output_res = datapreprocess(dataset_path, dataset_store) caffe_home = os.path.expanduser('~/caffe') if not os.path.exists(caffe_home): caffe_home = os.path.expanduser('~/caffe-env') convert_tool = findfile(caffe_home, 'convert_imageset') assert convert_tool is not None, "Can't find convert_imageset" if not exists(dataset_store): os.makedirs(dataset_store) for phase in ['train', 'val']: output_lmdb = join(dataset_store, '{}_lmdb'.format(phase)) if exists(output_lmdb): shutil.rmtree(output_lmdb) command = "{} --shuffle --resize_height={} --resize_width={} {}/ {} {}".format( convert_tool, shape[0], shape[1], output_res['{}_dataset'.format(phase)], output_res['{}_image_label'.format(phase)], output_lmdb) output_res["{}_lmdb".format(phase)] = output_lmdb args = shlex.split(command) ferror = open('log.err', 'w') p_data = subprocess.Popen(args, stdout=ferror) compute_image_mean = findfile(caffe_home, 'compute_image_mean') if compute_image_mean is not None: command = "{} {} {}".format(compute_image_mean, output_lmdb+"/", join( dataset_store, 'mean_{}.binaryproto'.format(phase))) mean_args = shlex.split(command) p_data.wait() output_res['{}_mean'.format(phase)] = join( dataset_store, 'mean_{}.binaryproto'.format(phase)) p = subprocess.Popen(mean_args) ferror.close() return output_res def caffe_home(): home_path = os.path.expanduser('~/') caffe_path = findfile(home_path, 'caffe') return caffe_path def base_network(in_path, output_path, args): train_val = join(in_path, 'train_val.prototxt') network_module = caffe_pb2.NetParameter() solver_module = caffe_pb2.SolverParameter() with open(train_val, 'r') as f: text_format.Parse(f.read(), network_module) for layer in network_module.layer: if layer.type == 'Data': for phase_mesg in layer.include: if phase_mesg.phase == 0: layer.transform_param.mean_file = args['train_mean'] layer.data_param.source = args['train_lmdb'] layer.data_param.batch_size = 16 else: layer.transform_param.mean_file = args['val_mean'] layer.data_param.source = args['val_lmdb'] if layer.type == 'InnerProduct': if layer.inner_product_param.num_output == 1000: layer.inner_product_param.num_output = 37 with open(join(output_path, 'train_val.pbtxt'), 'w') as f: f.write(text_format.MessageToString(network_module)) backup_path = join(output_path, 'model') if not exists(backup_path): os.makedirs(backup_path) with open(join(in_path, 'solver.prototxt'), 'r') as f: text_format.Parse(f.read(), solver_module) solver_module.net = join(output_path, 'train_val.pbtxt') solver_module.snapshot_prefix = backup_path with open(join(output_path, 'solver.prototxt'),'w') as f: f.write(text_format.MessageToString(solver_module)) # solver = join(in_path, 'solver.prototxt') if __name__ == "__main__": parser = argparse.ArgumentParser(description='create parser parser data!') parser.add_argument('--output_path', '-o', default='/tmp/pet', type=str) parser.add_argument('--dataset_path', '-d', default='~/Datasets/pets/images', type=str) parser.add_argument('--base_network', '-b', default='~/caffe/models/bvlc_reference_caffenet', type=str) args = parser.parse_args() config = convert_dataset(args.dataset_path, args.output_path) with open(join(args.output_path, 'info.json'), 'w') as f: json.dump(obj=config, fp=f) with open(join(args.output_path, 'info.json'), 'r') as f: config = json.load(f) base_network(args.base_network, args.output_path, config)

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156

一键训练脚本

CAFFE_ROOT=${HOME}/caffe SLOVER_ROOT=${CAFFE_ROOT}/models/resnet50 ${CAFFE_ROOT}/.build_debug/tools/caffe train --solver=$SLOVER_ROOT/solver.prototxt --gpu=0 1234

(solver.protxt)我就不贴了，你需要自己找到resnet50的solver和train_val配置好上面python脚本生成的路径。上面的脚本没有生成mean.binary使用如下命令：

${CAFFE_HOME}/.build_debug/tools/compute_image_mean /tmp/caffe_dataset/train_lmdb /tmp/caffe_dataset/mean_train.binary 1

最后罗嗦一句，如果你编译caffe的时候出现cblas啥的未定义的引用，那你需要好好确定一下你的blas库，如果你用Openblas的话定位一下它的库是否在正确的路径，否则你需要下载OpenBlas源码编译一下。当然如果你是用manjaro，万能的pacman可以帮你解决一切安装问题。