PyTorch+YOLOv5环境搭建(未完待续)

    技术2023-10-06  100

    PyTorch+YOLOv5环境搭建

    软硬件要求

    1. PyTorch Requirements

    NVIDIA CUDA 9.2 or above

    NVIDIA cuDNN v7 or above

    https://github.com/pytorch/pytorch#installation

    对应的最低GPU运算能力和驱动版本为:

    GPU Compute Capability >= 3.0

    Compatible Driver Version >= 396.26

    PyTorch版本:官方推荐的最低版本为Commands for Versions >= 1.0.0

    https://pytorch.org/get-started/previous-versions/

    2. YOLOv5 Requirements

    Python 3.7 or later

    torch >= 1.5(CUDA 9.2 or 10.1 or 10.2)

    https://github.com/ultralytics/yolov5

    3. System requirements

    不同版本的CUDA Toolkit,对各个Linux发行版有着不同的系统要求,如内核版本,GCC版本都要对应上,具体查阅对应版本的CUDA Toolkit Documentation。

    这里以CUDA 10.1为例,查看CUDA Toolkit 10.1 update2 (Aug 2019), Versioned Online Documentation的说明文档,要求Ubuntu系统版本为18.04.3,内核版本为5.0.0,GCC版本为7.4.0。

    ​ **Table 1. Native Linux Distribution Support in CUDA 10.1 Update 2 **

    DistributionKernel*GCCGLIBCICCPGIXLCCLANGx86_64RHEL 8.04.188.2.12.28RHEL 7.63.104.8.52.1719.018.x, 19.xNO8.0.0RHEL 6.102.6.324.4.72.12CentOS 7.63.104.8.52.17CentOS 6.102.6.324.4.72.12Fedora 294.168.0.12.27OpenSUSE Leap 15.04.15.07.3.12.26SLES 15.04.12.147.2.12.26SLES 12.44.12.144.8.52.22Ubuntu 18.104.18.08.2.02.28Ubuntu 18.04.3 (**)5.0.07.4.02.27Ubuntu 16.04.6 (**)4.45.4.02.23Ubuntu 14.04.6 (**)3.134.8.42.19————POWER8(***)RHEL 7.63.104.8.52.17NO18.x, 19.x13.1.x, 16.1.x8.0.0Ubuntu 18.04.14.15.07.3.02.27NO18.x, 19.x13.1.x, 16.1.x8.0.0POWER9(****)Ubuntu 18.04.14.15.07.3.02.27NO18.x, 19.x13.1.x, 16.1.x8.0.0RHEL 7.6 IBM Power LE4.14.04.8.52.17NO18.x, 19.x13.1.x, 16.1.x8.0.0

    4. Personal solutions

    本人GPU型号:NVIDIA GT 740M

    GPU Compute Capability = 3.0

    Compatible Driver Version >= 418.39 因此本人的方案是:

    Ubuntu 18.04.3

    CUDA Toolkit 10.1 update2 (Aug 2019)

    cuDNN v7.6.4 (September 27, 2019), for CUDA 10.1

    查询自己GPU的Compute Capability以及支持的CUDA,cuDNN版本

    https://developer.nvidia.com/cuda-gpus

    https://developer.nvidia.com/cuda-toolkit-archive

    https://developer.nvidia.com/rdp/cudnn-archive

    https://blog.csdn.net/yumin1058882119/article/details/106900592

    安装CUDA

    1. 安装文件准备

    安装方式有很多,这里建议采用.run手动安装,下载以下文件:

    NVIDIA Driver: NVIDIA-Linux-x86_64-418.113.runNVIDIA CUDA Toolkit: cuda_10.1.105_418.39_linux.runNVIDIA cuDNN: cudnn-10.1-linux-x64-v7.6.4.38.tgz

    2. Pre-installation Actions

    Verify You Have a CUDA-Capable GPU

    $ lspci | grep -i nvidia

    Verify You Have a Supported Version of Linux

    $ uname -m && cat /etc/*release

    Verify the System Has gcc Installed

    $ gcc --version 如果没有安装gcc,执行以下命令即可: $ sudo apt-get update $ sudo apt-get install build-essential

    Verify the System has the Correct Kernel Headers and Development Packages Installed

    $ uname -r 英伟达建议直接手动安装,根据上文命令获取到的版本号替换$(uname -r)部分 $ sudo apt-get install linux-headers-$(uname -r)

    3. 安装英伟达驱动

    CUDA Toolkit 中包含兼容版本的NVIDIA Driver,安装过程中可选择安装或者不安装驱动,但是驱动安装过程中无法添加可选参数,其中–no-opengl-files只安装驱动文件,不安装OpenGL文件,这条参数非常重要,否则可能引起登陆界面死循环。

    删除原有的NVIDIA驱动程序,如果没有装,请忽略

    $ sudo apt-get remove –purge nvidia*

    Bios禁用禁用secure boot,也就是设置为disable

    如果没有禁用secure boot,会导致NVIDIA驱动安装失败,或者不正常。

    Disable the Nouveau drivers

    打开编辑配置文件:

    $ sudo gedit /etc/modprobe.d/blacklist.conf

    在最后一行添加:

    blacklist nouveau options nouveau modeset=0

    这一步操作的目的是禁用nouveau第三方驱动,之后也不需要改回来。

    由于nouveau是构建在内核中的,所以要执行下面命令生效

    $ sudo update-initramfs -u

    重启

    $ reboot

    查看nouveau有没有运行,没输出代表禁用生效

    $ lsmod | grep nouveau

    安装NVIDIA32位兼容库 预装NVIDIA32位兼容库,否则安装驱动时会有警告,如果不考虑32位程序的话也可以不装。 $ sudo apt-get install lib32z1 lib32ncurses5

    停止可视化桌面 为了安装新的Nvidia驱动程序,我们需要停止当前的显示服务器。最简单的方法是使用telinit命令更改为运行级别3。执行以下linux命令后,显示服务器将停止,因此请确保在继续之前保存所有当前工作(如果有),装完驱动后需要重启:

    $ sudo telinit 3

    另一种方案:装完驱动后不用重启 进入tty:Ctrl+Alt+F1~F6 临时关闭:sudo service lightdm stop 安装驱动 临时开启:sudo service lightdm start 回到图形化界面:Ctrl+Alt+F7

    安装驱动

    给驱动文件增加可执行权限:

    $ sudo chmod a+x NVIDIA-Linux-x86_64-418.113.run

    然后执行安装:

    $ sudo ./NVIDIA-Linux-x86_64-418.113.run -no-opengl-files

    安装过程如下图:

    如果提前安装好Nvidia32位库的依赖,这里就不是警告,而是询问是否安装Nvidia32位库,选择是就行。

    参数介绍,后面两个参数可不加:

    –no-opengl-files 只安装驱动文件,不安装OpenGL文件。这个参数最重要–no-x-check 安装驱动时不检查X服务–no-nouveau-check 安装驱动时不检查nouveau

    重启 $ reboot

    验证显卡是否安装成功

    打开终端执行:

    nvidia-smi

    输出结果如下代表成功:

    4. 安装CUDA

    停止可视化桌面

    为了安装新的Nvidia驱动程序,我们需要停止当前的显示服务器。最简单的方法是使用telinit命令更改为运行级别3。执行以下linux命令后,显示服务器将停止,因此请确保在继续之前保存所有当前工作(如果有):

    $ sudo telinit 3

    安装CUDA

    $ sudo sh cuda_10.1.105_418.39_linux.run

    安装过程如下图: 中文语言系统会有方块乱码,这里执行安装命令后是要求输入密码,输入密码再回车就好了。 键入accept之前已经单独装了驱动,这里不要再选中第一个驱动项。 安装成功界面,否则会提示有error,找到log日志逐一排查就行。

    重启

    $ reboot

    Device Node Verification

    Check that the device files/dev/nvidia* exist and have the correct (0666) file permissions. These files are used by the CUDA Driver to communicate with the kernel-mode portion of the NVIDIA Driver. Applications that use the NVIDIA driver, such as a CUDA application or the X server (if any), will normally automatically create these files if they are missing using the setuidnvidia-modprobe tool that is bundled with the NVIDIA Driver. However, some systems disallow setuid binaries, so if these files do not exist, you can create them manually by using a startup script such as the one below:

    #!/bin/bash /sbin/modprobe nvidia if [ "$?" -eq 0 ]; then # Count the number of NVIDIA controllers found. NVDEVS=`lspci | grep -i NVIDIA` N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l` NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l` N=`expr $N3D + $NVGA - 1` for i in `seq 0 $N`; do mknod -m 666 /dev/nvidia$i c 195 $i done mknod -m 666 /dev/nvidiactl c 195 255 else exit 1 fi /sbin/modprobe nvidia-uvm if [ "$?" -eq 0 ]; then # Find out the major device number used by the nvidia-uvm driver D=`grep nvidia-uvm /proc/devices | awk '{print $1}'` mknod -m 666 /dev/nvidia-uvm c $D 0 else exit 1 fi

    5. Post-installation Actions

    Environment Setup

    The PATH variable needs to include /usr/local/cuda-10.1/bin and /usr/local/cuda-10.1/NsightCompute-. refers to the version of Nsight Compute that ships with the CUDA toolkit, e.g. 2019.1.

    To add this path to the PATH variable:

    export PATH=/usr/local/cuda-10.1/bin:/usr/local/cuda-10.1/NsightCompute-2019.1${PATH:+:${PATH}}

    In addition, when using the runfile installation method, the LD_LIBRARY_PATH variable needs to contain /usr/local/cuda-10.1/lib64 on a 64-bit system, or /usr/local/cuda-10.1/lib on a 32-bit system

    To change the environment variables for 64-bit operating systems:

    export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64\ ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

    To change the environment variables for 32-bit operating systems:

    export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib\ ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

    Note that the above paths change when using a custom install path with the runfile installation method.

    注:export命令配置的是临时环境变量,注销或重启后会失效。 解决方案:

    配置用户系统变量,仅当前用户有效 $ 待编辑配置系统环境变量,对所有用户有效 修改~/.bashrc文件 $ gedit ~/.bashrc 在文件末尾添加环境变量: export PATH=/usr/local/cuda/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export CUDA_HOME=/usr/local/cuda 生效 source ~/.bashrc

    2、CUDA多个版本的切换

    在安装了多个cuda版本后,可以在/usr/local目录下查看自己安装的cuda版本,如下图所示: 这里,cuda-10.0和cuda-10.1就是我们安装的两个cuda版本了,而cuda是一个软链接,它指向我们指定的cuda版本(注意上面在设置环境变量时,使用的是cuda,而不是cuda-10.0和cuda-10.1,这主要是为了方便我们切换cuda版本,可以让我们不用每次都去该环境变量的值)

    可以使用stat命令查看当前cuda软链接指向的哪个cuda版本,如下所示: 可以看到,文件类型是symbolic link,而指向的目录正是/usr/local/cuda-10.1,当我们想使用cuda-10.0版本时,只需要删除该软链接,然后重新建立指向cuda-10.0版本的软链接即可(注意名称还是cuda,因为要与bashrc文件里设置的保持一致)

    sudo rm -rf cuda sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda

    想切换其他版本的cuda,只需要改动建立软链接时cdua的正确路径即可

    Verify the Installation

    验证驱动,是否是nvidia驱动

    $ cat /proc/driver/nvidia/version

    cuda版本查询

    $ nvcc -V

    实例编译

    $ cd ~/NVIDIA_CUDA-10.1_Samples

    $ make

    如果报错,查看报错内容,根据提示解决即可。如果没有错误,会一直编译,十几分钟编译完成。生成的二进制文件将放在〜/ NVIDIA_CUDA- 10.1 _Samples / bin。

    运行实例

    $ cd ~/NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release

    $ ./deviceQuery

    输出结果如下代表成功:

    $ ./bandwidthTest

    输出结果如下代表成功: 如果有其他问题,参考对应版本的 CUDA Toolkit Documentation

    安装cuDNN

    cuDNN其实是一个软件开发包,因此其安装也就是把对应的软件库文件放到指定路径下即可。

    The NVIDIA® CUDA® Deep Neural Network library™ (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. cuDNN is part of the NVIDIA® Deep Learning SDK.

    直接引用官方文档Deep Learning SDK Documentation的内容:

    Installing cuDNN On Linux

    About this task The following steps describe how to build a cuDNN dependent program. Choose the installation method that meets your environment needs. For example, the tar file installation applies to all Linux platforms, and the Debian installation package applies to Ubuntu 16.04 and 18.04.

    In the following sections:

    your CUDA directory path is referred to as/usr/local/cuda/your cuDNN download path is referred to as<cudnnpath>

    Installing From A Tar File Before issuing the following commands, you’ll need to replace x.x and v8.x.x.x with your specific CUDA version and cuDNN version and package date.

    Procedure

    Navigate to your <cudnnpath>directory containing the cuDNN Tar file.Unzip the cuDNN package. $ tar -xzvf cudnn-x.x-linux-x64-v8.x.x.x.tgzCopy the following files into the CUDA Toolkit directory, and change the file permissions. $ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include $ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 $ sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

    安装Anaconda+PyTorch

    安装Anaconda

    下载Anaconda3 https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/ 安装Anaconda3 $ sudo bash Anaconda3-2020.07-Linux-x86_64.sh

    安装PyTorch

    第一步、添加镜像站到Anaconda执行如下命令:

    conda config --add channels http://mirror.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ conda config --add channels http://mirror.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ conda config --set show_channel_urls yes

    第二步、还可以附加第三方的conda源:

    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/bioconda/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

    第三步、切记在官网的命令上去去除:-c pytorch PyTorch v1.5.1

    # CUDA 10.1 conda install pytorch torchvision cudatoolkit=10.1

    注意:使用清华源安装时,去掉 -c pytorch,否则,不是从清华源下载相应的包。

    Previous PyTorch Versions PyTorch v1.2.0

    # CUDA 9.2 conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=9.2 -c pytorch # CUDA 10.0 conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch # CPU Only conda install pytorch==1.2.0 torchvision==0.4.0 cpuonly -c pytorch

    使用conda安装软件包出现的问题:Anaconda的使用权限

    NotWritableError: The current user does not have write permissions to a required path. path: /path/to/custom/dir/pkgs/urls.txt uid: 1000 gid: 1000

    使用如下命令:

    sudo chown -R 用户名 anaconda3

    安装Pycharm

    Processed: 0.016, SQL: 10