初识DX12,感觉API接口不仅晦涩难懂,而且其数量还不少,所以一边记录一边理解这些API,此次只是初始DX12,对DX的初始化有一个简单的理解,很多API的参数只有到了具体功能使用的时候才会深刻理解,所以这里就不详细介绍每个API的参数含义了。话不多说,上代码。
#if defined(DEBUG) || defined(_DEBUG) // Enable the D3D12 debug layer. { ComPtr<ID3D12Debug> debugController; ThrowIfFailed(D3D12GetDebugInterface(IID_PPV_ARGS(&debugController))); debugController->EnableDebugLayer(); } #endif开启DX12的Debug模型,开启后才能输出内部错误及警告。
ThrowIfFailed(CreateDXGIFactory1(IID_PPV_ARGS(&mdxgiFactory))); HRESULT hardwareResult = D3D12CreateDevice( nullptr, // default adapter D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&md3dDevice)); // Fallback to WARP device. if(FAILED(hardwareResult)) { ComPtr<IDXGIAdapter> pWarpAdapter; ThrowIfFailed(mdxgiFactory->EnumWarpAdapter(IID_PPV_ARGS(&pWarpAdapter))); ThrowIfFailed(D3D12CreateDevice( pWarpAdapter.Get(), D3D_FEATURE_LEVEL_11_0, IID_PPV_ARGS(&md3dDevice))); }创建一个DX12设备,DX12设备有点lua的虚拟机的感觉。你需要给DX12设备分配一个显卡(第一个参数),还需要指定设备支持的特性集,是DX9还是DX10还是DX11,特性集向后兼容。如果创建失败,则获取操作系统的虚拟显卡(软光栅化),然后创建DX设备。
mRtvDescriptorSize = md3dDevice->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_RTV); mDsvDescriptorSize = md3dDevice->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_DSV); mCbvSrvUavDescriptorSize = md3dDevice->GetDescriptorHandleIncrementSize(D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV);保存各种描述符的大小,比如渲染目标描述符,深度/模板缓存描述符,常量缓冲区,着色器资源,无序访问视图描述符。
D3D12_FEATURE_DATA_MULTISAMPLE_QUALITY_LEVELS msQualityLevels; msQualityLevels.Format = mBackBufferFormat; msQualityLevels.SampleCount = 4; msQualityLevels.Flags = D3D12_MULTISAMPLE_QUALITY_LEVELS_FLAG_NONE; msQualityLevels.NumQualityLevels = 0; ThrowIfFailed(md3dDevice->CheckFeatureSupport( D3D12_FEATURE_MULTISAMPLE_QUALITY_LEVELS, &msQualityLevels, sizeof(msQualityLevels))); m4xMsaaQuality = msQualityLevels.NumQualityLevels; assert(m4xMsaaQuality > 0 && "Unexpected MSAA quality level.");特性检测,这个是MSAA特性检测,其实是根据纹理的格式以及采样的数量对硬件进行查询,看是否支持这种格式的多重纹理采样。这里只是冰山一角,还有很多特性需要查询,这个就具体问题具体分析了。
void D3DApp::CreateCommandObjects() { D3D12_COMMAND_QUEUE_DESC queueDesc = {}; queueDesc.Type = D3D12_COMMAND_LIST_TYPE_DIRECT; queueDesc.Flags = D3D12_COMMAND_QUEUE_FLAG_NONE; ThrowIfFailed(md3dDevice->CreateCommandQueue(&queueDesc, IID_PPV_ARGS(&mCommandQueue))); ThrowIfFailed(md3dDevice->CreateCommandAllocator( D3D12_COMMAND_LIST_TYPE_DIRECT, IID_PPV_ARGS(mDirectCmdListAlloc.GetAddressOf()))); ThrowIfFailed(md3dDevice->CreateCommandList( 0, D3D12_COMMAND_LIST_TYPE_DIRECT, mDirectCmdListAlloc.Get(), // Associated command allocator nullptr, // Initial PipelineStateObject IID_PPV_ARGS(mCommandList.GetAddressOf()))); // Start off in a closed state. This is because the first time we refer // to the command list we will Reset it, and it needs to be closed before // calling Reset. mCommandList->Close(); }创建命令队列以及命令列表,GPU端维护的是命令队列,CPU端维护的是命令列表。CPU将渲染命令提交到命令列表中,然后通过ExecuteCommandLists函数将这些命令提交到GPU的命令队列中,然后GPU从命令队列中取出指令进行操作。命令队列实际上是一个环形缓存区,cpu负责写,gpu负责读。如果cpu写的太快,环形缓冲区被填满,那么cpu就需要等待,反之如果环形缓冲区是空的gpu就需要等待。创建命令队列之前需要创建命令队列的命令分配器,这有点像STL容器需要指定一个内存分配器。多个命令队列可以共享一个命令分配器,但同时只能有一个命令队列记录命令,换句话说要保证命令队列中的所有命令都会按照顺序连续地添加到命令分配器中。DX12革新之一就是希望充分利用cpu多核的优势,执行多线程渲染。因为可以创建多个命令列表,在不同的线程中记录不同的渲染指令,然后丢到gpu中。以下是多线程环境中使用命令列表需要注意的地方:
多线程不能共享同一个命令列表,也不能同时调用一个命令列表的方法,因此每个线程需要维护自己的命令列表。多线程不能共享同一个命令分配器,也不能同时调用一个命令分配器的方法,因此每个线程需要维护自己的命令分配器。命令队列是线程自由的,所有线程可以同时访问同一个命令队列,也可以同时调用它的方法,特别是每个线程可以同时向命令队列中提交自己的命令列表。出于性能的原因,应用程序必须在初始化期间,指出用于并行记录命令列表的最大数量。 ThrowIfFailed(md3dDevice->CreateFence(0, D3D12_FENCE_FLAG_NONE, IID_PPV_ARGS(&mFence)));创建一个Fence(围栏),围栏是什么东西?
cpu与gpu并行工作的时候,一定会产生一些列的同步问题。比如cpu设置了物体的位置信息,然后gpu根据物体的位置进行绘制。这时候cpu很有可能在gpu渲染之前覆写了物体的位置信息。这就会导致物体跳动,解决的办法就是强制cpu等待,直到gpu处理外围栏点之前的全部命令。可以在gpu的命令队列中设置一个围栏点,直到gpu执行了围栏点命令后cpu再继续执行。在重置命令分配器之前,需要确保所有命令列表都被执行完毕,因此可以通过设置围栏,等待全部命令执行完毕。
void D3DApp::FlushCommandQueue() { // Advance the fence value to mark commands up to this fence point. mCurrentFence++; // Add an instruction to the command queue to set a new fence point. Because we // are on the GPU timeline, the new fence point won't be set until the GPU finishes // processing all the commands prior to this Signal(). ThrowIfFailed(mCommandQueue->Signal(mFence.Get(), mCurrentFence)); // Wait until the GPU has completed commands up to this fence point. if(mFence->GetCompletedValue() < mCurrentFence) { HANDLE eventHandle = CreateEventEx(nullptr, false, false, EVENT_ALL_ACCESS); // Fire event when GPU hits current fence. ThrowIfFailed(mFence->SetEventOnCompletion(mCurrentFence, eventHandle)); // Wait until the GPU hits current fence event is fired. WaitForSingleObject(eventHandle, INFINITE); CloseHandle(eventHandle); } }设置一个围栏点,等待所有命令执行完毕后再开始下一帧。
void D3DApp::CreateSwapChain() { // Release the previous swapchain we will be recreating. mSwapChain.Reset(); DXGI_SWAP_CHAIN_DESC sd; sd.BufferDesc.Width = mClientWidth; sd.BufferDesc.Height = mClientHeight; sd.BufferDesc.RefreshRate.Numerator = 60; sd.BufferDesc.RefreshRate.Denominator = 1; sd.BufferDesc.Format = mBackBufferFormat; sd.BufferDesc.ScanlineOrdering = DXGI_MODE_SCANLINE_ORDER_UNSPECIFIED; sd.BufferDesc.Scaling = DXGI_MODE_SCALING_UNSPECIFIED; sd.SampleDesc.Count = m4xMsaaState ? 4 : 1; sd.SampleDesc.Quality = m4xMsaaState ? (m4xMsaaQuality - 1) : 0; sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; sd.BufferCount = SwapChainBufferCount; sd.OutputWindow = mhMainWnd; sd.Windowed = true; sd.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD; sd.Flags = DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH; // Note: Swap chain uses queue to perform flush. ThrowIfFailed(mdxgiFactory->CreateSwapChain( mCommandQueue.Get(), &sd, mSwapChain.GetAddressOf())); }描述并创建交换链,创建前后台缓存,并且和命令列表绑定,交换链需要通过命令队列对其进行刷新。
void D3DApp::CreateRtvAndDsvDescriptorHeaps() { D3D12_DESCRIPTOR_HEAP_DESC rtvHeapDesc; rtvHeapDesc.NumDescriptors = SwapChainBufferCount; rtvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; rtvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; rtvHeapDesc.NodeMask = 0; ThrowIfFailed(md3dDevice->CreateDescriptorHeap( &rtvHeapDesc, IID_PPV_ARGS(mRtvHeap.GetAddressOf()))); D3D12_DESCRIPTOR_HEAP_DESC dsvHeapDesc; dsvHeapDesc.NumDescriptors = 1; dsvHeapDesc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_DSV; dsvHeapDesc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; dsvHeapDesc.NodeMask = 0; ThrowIfFailed(md3dDevice->CreateDescriptorHeap( &dsvHeapDesc, IID_PPV_ARGS(mDsvHeap.GetAddressOf()))); }创建描述符堆,这里面包含了渲染目标视图堆,以及深度/模板缓存视图堆,这个描述符堆有点搞不懂为什么这么设计,官方的解释是cpu或gpu的资源并不像dx9直接绑定,而是通过一个轻量级的结构体来描述的,此结构体被称为描述符对象(descriptor),需要创建描述符堆(descriptor heap)来保存这些对象,这些就是dx12计算速度快的根本原因,很多环节可以前期处理,节省后续的循环处理开销。估计要等深入学习DX12之后才能重新理解这块了,目前就按照规则使用吧。
void D3DApp::OnResize() { assert(md3dDevice); assert(mSwapChain); assert(mDirectCmdListAlloc); // Flush before changing any resources. FlushCommandQueue(); ThrowIfFailed(mCommandList->Reset(mDirectCmdListAlloc.Get(), nullptr)); // Release the previous resources we will be recreating. for (int i = 0; i < SwapChainBufferCount; ++i) mSwapChainBuffer[i].Reset(); mDepthStencilBuffer.Reset(); // Resize the swap chain. ThrowIfFailed(mSwapChain->ResizeBuffers( SwapChainBufferCount, mClientWidth, mClientHeight, mBackBufferFormat, DXGI_SWAP_CHAIN_FLAG_ALLOW_MODE_SWITCH)); mCurrBackBuffer = 0; CD3DX12_CPU_DESCRIPTOR_HANDLE rtvHeapHandle(mRtvHeap->GetCPUDescriptorHandleForHeapStart()); for (UINT i = 0; i < SwapChainBufferCount; i++) { ThrowIfFailed(mSwapChain->GetBuffer(i, IID_PPV_ARGS(&mSwapChainBuffer[i]))); md3dDevice->CreateRenderTargetView(mSwapChainBuffer[i].Get(), nullptr, rtvHeapHandle); rtvHeapHandle.Offset(1, mRtvDescriptorSize); } // Create the depth/stencil buffer and view. D3D12_RESOURCE_DESC depthStencilDesc; depthStencilDesc.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D; depthStencilDesc.Alignment = 0; depthStencilDesc.Width = mClientWidth; depthStencilDesc.Height = mClientHeight; depthStencilDesc.DepthOrArraySize = 1; depthStencilDesc.MipLevels = 1; depthStencilDesc.Format = mDepthStencilFormat; depthStencilDesc.SampleDesc.Count = m4xMsaaState ? 4 : 1; depthStencilDesc.SampleDesc.Quality = m4xMsaaState ? (m4xMsaaQuality - 1) : 0; depthStencilDesc.Layout = D3D12_TEXTURE_LAYOUT_UNKNOWN; depthStencilDesc.Flags = D3D12_RESOURCE_FLAG_ALLOW_DEPTH_STENCIL; D3D12_CLEAR_VALUE optClear; optClear.Format = mDepthStencilFormat; optClear.DepthStencil.Depth = 1.0f; optClear.DepthStencil.Stencil = 0; ThrowIfFailed(md3dDevice->CreateCommittedResource( &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_DEFAULT), D3D12_HEAP_FLAG_NONE, &depthStencilDesc, D3D12_RESOURCE_STATE_COMMON, &optClear, IID_PPV_ARGS(mDepthStencilBuffer.GetAddressOf()))); // Create descriptor to mip level 0 of entire resource using the format of the resource. md3dDevice->CreateDepthStencilView(mDepthStencilBuffer.Get(), nullptr, DepthStencilView()); // Transition the resource from its initial state to be used as a depth buffer. mCommandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(mDepthStencilBuffer.Get(), D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_STATE_DEPTH_WRITE)); // Execute the resize commands. ThrowIfFailed(mCommandList->Close()); ID3D12CommandList* cmdsLists[] = { mCommandList.Get() }; mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists); // Wait until resize is complete. FlushCommandQueue(); // Update the viewport transform to cover the client area. mScreenViewport.TopLeftX = 0; mScreenViewport.TopLeftY = 0; mScreenViewport.Width = static_cast<float>(mClientWidth); mScreenViewport.Height = static_cast<float>(mClientHeight); mScreenViewport.MinDepth = 0.0f; mScreenViewport.MaxDepth = 1.0f; mScissorRect = { 0, 0, mClientWidth, mClientHeight }; }每次更改窗口尺寸都需要重新设置前后缓冲区的尺寸,重新为缓冲区创建RT,重新创建深度缓冲区,更新视口,更新裁剪区域。
void InitDirect3DApp::Draw(const GameTimer& gt) { // Reuse the memory associated with command recording. // We can only reset when the associated command lists have finished execution on the GPU. ThrowIfFailed(mDirectCmdListAlloc->Reset()); // A command list can be reset after it has been added to the command queue via ExecuteCommandList. // Reusing the command list reuses memory. ThrowIfFailed(mCommandList->Reset(mDirectCmdListAlloc.Get(), nullptr)); // Indicate a state transition on the resource usage. mCommandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(CurrentBackBuffer(), D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_RENDER_TARGET)); // Set the viewport and scissor rect. This needs to be reset whenever the command list is reset. mCommandList->RSSetViewports(1, &mScreenViewport); mCommandList->RSSetScissorRects(1, &mScissorRect); // Clear the back buffer and depth buffer. mCommandList->ClearRenderTargetView(CurrentBackBufferView(), Colors::LightSteelBlue, 0, nullptr); mCommandList->ClearDepthStencilView(DepthStencilView(), D3D12_CLEAR_FLAG_DEPTH | D3D12_CLEAR_FLAG_STENCIL, 1.0f, 0, 0, nullptr); // Specify the buffers we are going to render to. mCommandList->OMSetRenderTargets(1, &CurrentBackBufferView(), true, &DepthStencilView()); // Indicate a state transition on the resource usage. mCommandList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(CurrentBackBuffer(), D3D12_RESOURCE_STATE_RENDER_TARGET, D3D12_RESOURCE_STATE_PRESENT)); // Done recording commands. ThrowIfFailed(mCommandList->Close()); // Add the command list to the queue for execution. ID3D12CommandList* cmdsLists[] = { mCommandList.Get() }; mCommandQueue->ExecuteCommandLists(_countof(cmdsLists), cmdsLists); // swap the back and front buffers ThrowIfFailed(mSwapChain->Present(0, 0)); mCurrBackBuffer = (mCurrBackBuffer + 1) % SwapChainBufferCount; // Wait until frame commands are complete. This waiting is inefficient and is // done for simplicity. Later we will show how to organize our rendering code // so we do not have to wait per frame. FlushCommandQueue(); }每帧绘制之前先重置命令列表分配器,重置命令列表,将后台缓存区的状态由呈现状态转换到渲染状态,重置视口及裁剪区域,清空back buffer和depth buffer,指定将要渲染的缓冲区,再次将资源状态由渲染目标状态转换呈现状态,提交命令,交换缓冲区。
这部分代码逻辑不复杂,复杂的是背后的渲染知识以及功能细节,慢慢来吧,先和DX12说声hello。