NVIDIA PhysX SDK 3.4.1


Callback Sequence

The simplest type of simulation callbacks are the events. Using callbacks the application can simply listen for events and react as required, provided the callbacks obey the rule that SDK state changes are forbidden. This restriction may be a bit surprising given that the SDK permits writes to an inactive back-buffer while the simulation is running. Event callbacks, however, are not called from within the simulation thread, but rather from inside fetchResults(). The key point here is that fetchResults() processes the buffered writes, meaning that writing to the SDK from an event callback can be a particularly fragile affair. To avoid this fragility it is necessary to impose the rule that SDK state changes are not permitted from an event callback.

Inside fetchResults(), among other things, the buffers are swapped. More specifically, this means that properties of each object's internal simulation state are copied to the API-visible state. Some event callbacks happen before this swap, and some after. The events that happen before are:

  • onTrigger
  • onContact
  • onConstraintBreak

When these events are received in the callback, the shapes, actors, etc. will still be in the state they were in immediately before the simulation started. This is preferable, because these events were detected early on during the simulation, before objects were integrated (moved) forward. For example, a pair of shapes that get an onContact() to report that they are in contact will still be in contact when the call is made, even though they may have bounced apart again after fetchResults() returns.

On the other hand, these events are sent after the swap:

  • onSleep
  • onWake

Sleep information is updated after objects have been integrated, so it makes sense to send these events after the swap.

To 'listen' to any of these events it is necessary to first subclass PxSimulationEventCallback so that the various virtual functions may be implemented as desired. An instance of this subclass can then be registered per scene with either PxScene::setSimulationEventCallback or PxSceneDesc::simulationEventCallback. Following these steps alone will ensure that constraint break events are successfully reported. One further step is required to report sleep and wake events: to avoid the expense of reporting all sleep and wake events, actors identified as worthy of sleep/wake notification require the flag PxActorFlag::eSEND_SLEEP_NOTIFIES to be raised. Finally, to receive onContact and onTrigger events it is necessary to set a flag in the filter shader callback for all pairs of interacting objects for which events are required. More details of the filter shader callback can be found in Section Collision Filtering.

Simulation memory

PhysX relies on the application for all memory allocation. The primary interface is via the PxAllocatorCallback interface required to initialize the SDK:

class PxAllocatorCallback
    virtual ~PxAllocatorCallback() {}
    virtual void* allocate(size_t size, const char* typeName, const char* filename,
        int line) = 0;
    virtual void deallocate(void* ptr) = 0;

After the self-explanatory function argument describing the size of the allocation, the next three function arguments are an identifier name, which identifies the type of allocation, and the __FILE__ and __LINE__ location inside the SDK code where the allocation was made. More details of these function arguments can be found in the PhysXAPI documentation.


An important change since 2.x: The SDK now requires that the memory that is returned be 16-byte aligned. On many platforms malloc() returns memory that is 16-byte aligned, but on Windows the system function _aligned_malloc() provides this capability.


On some platforms PhysX uses system library calls to determine the correct type name, and the system function that returns the type name may call the system memory allocator. If you are instrumenting system memory allocations, you may observe this behavior. To prevent PhysX requesting type names, disable allocation names using the method PxFoundation::setReportAllocationNames().

Minimizing dynamic allocation is an important aspect of performance tuning. PhysX provides several mechanisms to control and analyze memory usage. These shall be discussed in turn.

Scene Limits

The number of allocations for tracking objects can be minimized by presizing the capacities of scene data structures, using either PxSceneDesc::limits before creating the scene or the function PxScene::setLimits(). It is useful to note that these limits do not represent hard limits, meaning that PhysX will automatically perform further allocations if the number of objects exceeds the scene limits.

16K Data Blocks

Much of the memory PhysX uses for simulation is held in a pool of blocks, each 16K in size. The initial number of blocks allocated to the pool can be controlled by setting PxSceneDesc::nbContactDataBlocks, while the maximum number of blocks that can ever be in the pool is governed by PxSceneDesc::maxNbContactDataBlocks. If PhysX internally needs more blocks than nbContactDataBlocks then it will automatically allocate further blocks to the pool until the number of blocks reaches maxNbContactDataBlocks. If PhysX subsequently needs more blocks than the maximum number of blocks then it will simply start dropping contacts and joint constraints. When this happens warnings are passed to the error stream in the PX_CHECKED configuration.

To help tune nbContactDataBlocks and maxNbContactDataBlocks it can be useful to query the number of blocks currently allocated to the pool using the function PxScene::getNbContactDataBlocksUsed(). It can also be useful to query the maximum number of blocks that can ever be allocated to the pool with PxScene::getMaxNbContactDataBlocksUsed.

Unused blocks can be reclaimed using PxScene::flushSimulation(). When this function is called any allocated blocks not required by the current scene state will be deleted so that they may be reused by the application. Additionally, a number of other memory resources are freed by shrinking them to the minimum size required by the scene configuration.

Scratch Buffer

A scratch memory block may be passed as a function argument to the function PxScene::simulate. As far as possible, PhysX will internally allocate temporary buffers from the scratch memory block, thereby reducing the need to perform temporary allocations from PxAllocatorCallback. The block may be reused by the application after the PxScene::fetchResults() call, which marks the end of simulation. One restriction on the scratch memory block is that it must be a multiple of 16K, and it must be 16-byte aligned.

In-place Serialization

PhysX objects cab be stored in memory owned by the application using PhysX' binary deserialization mechanism. See Serialization for details.

PVD Integration

Detailed information about memory allocation can be recorded and displayed in the PhysX Visual Debugger. This memory profiling feature can be configured by setting the trackOutstandingAllocations flag when calling PxCreatePhysics(), and raising the flag PxVisualDebuggerConnectionFlag::eMEMORY when connecting to the debugger with PxVisualDebuggerExt::createConnection().

Completion Tasks

A completion task is a task that executes immediately after PxScene::simulate has exited. If PhysX has been configured to use worker threads then PxScene::simulate will start simulation tasks on the worker threads and will likely exit before the worker threads have completed the work necessary to complete the scene update. As a consequence, a typical completion task would first need to call PxScene::fetchResults(true) to ensure that fetchResults blocks until all worker threads started during simulate() have completed their work. After calling fetchResults(true), the completion task can perform any other post-physics work deemed necessary by the application:

scene.fetchResults(true); game.updateA(); game.updateB(); ... game.updateZ();

The completion task is specified as a function argument in PxScene::simulate. More details can be found in the PhysAPI documentation.

Synchronizing with Other Threads

An important consideration for substepping is that simulate() and fetchResults() are classed as write calls on the scene, and it is therefore illegal to read from or write to a scene while those functions are running. For the simulate() function it is important to make the distinction between running and ongoing. In this context, it is illegal to read or write to a scene before simulate() exits. It is perfectly legal, however, to read or write to a scene after simulate() has exited but before the worker threads that started during the simulate() call have completed their work.


PhysX does not lock its scene graph, but it will report an error in checked build if it detects that multiple threads make concurrent calls to the same scene, unless they are all read calls.


For reasons of fidelity simulation or better stability it is often desired that the simulation frequency of PhysX be higher than the update rate of the application. The simplest way to do this is just to call simulate() and fetchResults() multiple times:

for(PxU32 i=0; i<substepCount; i++)
    ... pre-simulation work (update controllers, etc) ...
    ... post simulation work (process physics events, etc) ...

Sub-stepping can also be integrated with the completion task feature of the simulate() function. To illustrate this, consider the situation where the scene is simulated until the graphics component signals that it has completed updating the render state of the scene. Here, the completion task will naturally run after simulate() has exited. Its first job will be to block with fetchResults(true) to ensure that it waits until both simulate() and fetchResults() have completed their sequential work. When the completion task is able to proceed its next work item will be to query the graphics component to check if another simulate() is required or if it can exit. In the case that another simulate() step is required it will clearly need to pass a completion task to simulate(). A tricky point here is that a completion task cannot submit itself as the next completion task because it would cause an illegal recursion. A solution to this problem might to be to have two completion tasks where each stores a reference to the other. Each completion task can then pass its partner to simulate():


Split sim

As an alternative to simulate(), you can split the simulation into two different phases, collide() and advance(). For some properties, called write-through properties, modifications during the collide() phase will be seen immediately by the subsequent advance() phase. This allows collide() to begin before the data required by advance() is available and to run in parallel with game logic that generates inputs to advance(). This is particularly useful for animation logic generating kinematic targets, and for controllers applying forces to bodies. The write-through properties are listed below:


When using the split sim, a physics simulation loop would look like this:


Any other sequence of API calls is illegal. The SDK will issue error messages. The users can interleave the physics-dependent game logic between collide() and fetchCollision:

physics-dependent game logic(anmimation, rendering)

fetchCollision() will wait until collide() has finished before it updates the write-through properties in the SDK. Once fetchCollision() has completed, any state modification performed on the objects in the executing scene will be buffered and will not be reflected until the simulation and a call to fetchResults() has completed. The solver will take the write-through properties into account when computing the new sets of velocities and poses for the actors being simulated.

Split fetchResults

The fetchResults() method is available in both a standard and split format. The split format offers some advantages over the standard fetchResult() method because it permits the user to parallelize processing of contact reports, which can be expensive when simulating complex scenes.

A simplistic way to use split fetchResults would look something like this:

gSharedIndex = 0;

gScene->simulate(1.0f / 60.0f);

//Call fetchResultsStart. Get the set of pair headers
const PxContactPairHeader* pairHeader;
PxU32 nbContactPairs;
gScene->fetchResultsStart(pairHeader, nbContactPairs, true);

//Set up continuation task to be run after callbacks have been processed in parallel
callbackFinishTask.setContinuation(*gScene->getTaskManager(), NULL);

//process the callbacks




The user is free to use their own task/threading system to process the callbacks. However, the PhysX scene provides a utility function that processes the callbacks using multiple threads, which is used in this code snippet. This method takes a continuation task that will be run when the tasks processing callbacks have completed. In this example, the completion task raises an event that can be waited upon to notify the main thread that callback processing has completed.

This feature is demonstrated in SnippetSplitFetchResults. In order to make use of this approach, contact notification callbacks must be thread-safe. Furthermore, for this approach to be beneficial, contact notification callbacks need to be doing a significant amount of work to benefit from multi-threading them