Windows下PyTorch(LibTorch)配置cuda加速

技术2024-03-21 93

PyTorch is not linked with support for cuda devices (getDeviceGuardImpl at C:\w\b\windows\pytorch\c10/core/impl/DeviceGuardImplInterface.h:216) (no backtrace available)

LibTorch 1.7.0,LibTorch1.6.0

PyTorch 在Windows下配置cuda加速似乎变得有些诡异

测试1.4,1.5.0,1.5.1,nightly,python(pip install ...)安装之后都不能使用,网上搜索资料发现有遇到的,没解决的.....废话不多说了

2020-9-6更新:增加1.6版本支持

如果方法好用,请点赞收藏支持一下,如果不好用,请在评论区告诉我

在链接器选项中添加以下内容(各版本有所不同),不熟悉的小伙伴们可以参考填写位置如下图(以1.5版本为例)

PyTorch

(LibTorch) 版本

编译选项 1.6仅Debug适用-INCLUDE:??$Abs@MVCUDAContext@caffe2@@@math@caffe2@@YAXHPEBMPEAMPEAVCUDAContext@1@@Z 1.6/1.7Debug/Release-INCLUDE:?warp_size@cuda@at@@YAHXZ 1.5Debug/Release-INCLUDE:THCudaCharTensor_zero

填写该参数后需要链接torch_cuda.lib文件,随后就能体验飞一般的速度提升啦.

附:

cuda测试代码

struct Net : torch::nn::Module { Net(int64_t N, int64_t M) { W = register_parameter("W", torch::randn({ N, M })); b = register_parameter("b", torch::randn(M)); } torch::Tensor forward(torch::Tensor input) { return torch::addmm(b, input, W); } torch::Tensor W, b; }; void testCuda() { Net lmodule(4096, 4096);

try { torch::Tensor tensor = torch::eye(4096, torch::kFloat).to(deviceGPU); lmodule.to(deviceGPU); for (size_t i = 0; i < 1024 * 64; i++) lmodule.forward(tensor); //tensor1* tensor2; } catch (const std::exception& ex) { std::cout << ex.what(); } getchar(); }

以下代码是加速失败的情况

torch::Tensor tensor1 = torch::eye(9128); torch::Tensor tensor2 = torch::eye(9128); tensor1.to(deviceGPU); tensor2.to(deviceGPU); for (size_t i = 0; i < 1024 * 64; i++) { tensor1* tensor2; }

Processed: 0.010, SQL: 9