D3D/OGL CUDA Interop
Warning
CUDA interop is not supported in D3D9.
APEX is capable of writing directly into graphics buffers on the GPU from its CUDA kernels. This is a substantial performance improvement for data that is generated on the GPU. APEX does not have to copy that data to host memory and does not have to call writeBuffer() on your graphics buffer. APEX also does not need to allocate an output buffer for that data, since it is writing directly into your graphics buffer.
Requirements
- Your application must implement the UserRenderResource APIs.
2) Your application must provide a CUDA context to the APEX scene at creation. The CUDA context given to the APEX scene must support interop with your chosen graphics API (OGL, D3D10, or D3D11), and it must be bound to your graphics device:
physx::PxCudaContextManagerDesc ctxMgrDesc;
ctxMgrDesc.graphicsDevice = renderer->getDevice();
ctxMgrDesc.interopMode = physx::PxCudaInteropMode::D3D11_INTEROP;
physx::PxCudaContextManager *ctxMgr = nvidia::apex::CreateCudaContextManager(ctxMgrDesc, *nvidia::GetApexSDK()->getErrorCallback());
nvidia::apex::ApexSceneDesc apexSceneDesc;
apexSceneDesc.scene = physxScene;
if( ctxMgr )
apexSceneDesc.gpuDispatcher = ctxMgr->getGpuDispatcher();
apexScene = apexSDK->createScene(apexSceneDesc);
3) Your UserRenderSpriteBuffer, UserRenderInstanceBuffer, etc implementations must respect the descriptor arguments for registering graphics resources with the specified CUDA context, unregistering those resources before they are deleted, and re-registering them if they are reallocated for any purpose, such as a graphics device reset:
void D3D11RendererInstanceBuffer::onDeviceReset(void)
{
if(!m_d3dInstanceBuffer)
{
m_d3dDevice.CreateBuffer(&m_d3dBufferDesc, NULL, &m_d3dInstanceBuffer);
if(m_interopContext && m_d3dInstanceBuffer && m_mustBeRegisteredInCUDA)
{
m_registeredInCUDA = m_interopContext->registerResourceInCudaD3D(m_InteropHandle, m_d3dInstanceBuffer);
}
}
}
Note
All graphics buffers should be unlocked when calling the resource registeration methods. Locked buffers can lead to CUDA errors.
4) Your UserRenderSpriteBuffer, UserRenderInstanceBuffer, etc implementations must return a valid graphics resource pointer from their getInteropResourceHandle() methods:
bool getInteropResourceHandle(CUgraphicsResource &handle)
{
if(m_registeredInCUDA && m_InteropHandle)
{
handle = m_InteropHandle;
return true;
}
return false;
}
5) The render resource allocated for interop purposes much match the size and order of the semantics specified in the creation descriptor’s semanticFormats and semanticOffsets arrays.
6) Once per render frame, the CPU thread holding the graphics device context must call the Scene::prepareRenderResourceContexts() method. More on this below.
Detailed Explanation
When the game render thread calls Scene::prepareRenderResourceContexts(), the scene will collect the list of graphics resources that were previously mapped, written into on the GPU, and are now ready to be rendered. Those buffers must be unmapped before they can be used by the game’s renderer. At the same time, it collects the list of buffers which will be written to on the GPU the next simulation step and maps them into the scene’s CUDA context.
By default, APEX assumes that the game’s render thread is not tied to the thread that is stepping the APEX scene, so when interop is in use it uses a triple buffering scheme that ensures there is always a mapped buffer ready to be used by the simulation thread. If your game can guarantee that the render thread can always call the prepareRenderResourceContexts() method between every fetchResults() and simulate() calls on the game thread, it can safely tell APEX to only use double buffering for interop. See the IOFX module documentation for details.
In the simulation thread, APEX attempts to get a valid CUDA pointer for the mapped graphics buffer. If this succeeds, APEX CUDA kernels will write directly into the mapped buffer and it can be directly rendered after the simulation completes and it is unmapped.
If APEX fails to get a valid CUDA pointer for any reason, it falls back to using a CUDA device allocated buffer (the same format and size of the graphics buffer). This CUDA buffer must be copied to host memory at the end of the simulation and its contents are later copied into the game’s graphics buffer via a writeBuffer() call. In essence, when APEX is unable to get a valid mapped pointer it falls back to the non-interop writeBuffer() path. At times this can make it difficult to tell whether interop is actually working. The only reliable way to tell is to check whether writeBuffer() is being called on your graphics buffers.
Once the simulation begins using mapped buffers again, the temporary CUDA output buffers are no longer necessary and are subsequently released.
In the Scene::fetchResults() method, the “working” set of output buffers is swapped into the “render” set of output buffers. Any call to prepareRenderResourceContexts() after fetchResults() will unmap the output buffer from CUDA so that it can be used by the graphics context.
If triple buffering is in effect (the default when interop is enabled), the previous “render” set of output buffers is swapped into the “deferred” set. APEX will try to map these graphics buffers back into CUDA for use in the next simulation step. The previous “deferred” set is swapped into the “working” set. Working set buffers are also mapped into CUDA if they were not mapped while they were deferred.
If double buffering is in effect, the “working” and “render” sets are simply exchanged during fetchResults. When the render thread calls prepareRenderResourceContexts() the new render set is unmapped and the new working set is mapped.
Notes
If the game engine wishes to disable interop for only a subsection of graphics buffers (perhaps you need instance buffers on the CPU, for instance), it can simply return false from its getInteropResourceHandle() method. APEX will not try to map such buffers into CUDA and thus it will safely fall back to the non-interop/writeBuffer() code path.
Some modules allow interop to be disabled entirely. See the IOFX Module description for details.
Even in a game engine that perfectly implements the APEX graphical interop methods, there will be render frames where a mapped buffer was not available at simulation time and thus a writeBuffer() will be performed during a renderable actor’s updateRenderResouces() call. The most common causes of this are initial startup, where the scene is stepped before it has been rendered, or on frames immediately after graphics buffers were reallocated (aka, device reset). The point of bringing this up is that even if you are using interop properly, you must still implement the writeBuffer() methods of these graphics buffers.
Requirement #2, which states that the CUDA context given to the APEX scene must be bound to your graphics device, implies that APEX CUDA kernels must run on the same GPU as you are rendering with. This can be counter-productive when the user has a dedicated GPU for PhysX/APEX. Your game should probably disable interop if a dedicated GPU is available.