Trying to understand how to implement a mass spring setup (cloth sim) with a compute shader

I am trying to implement a mass spring system using a compute shader and have run into a weird issue.

Essentially the way I have setup the system is using three separate compute shaders, each with their own respective purpose.

The first is as follows:

struct Vertex
{
    float3 position;
    float3 normal;
    float2 texCoord;
    float4 tangent;
};

struct InterlockedVector
{
    int x;
    int y;
    int z;
};

// skinning that has been transferred onto the simulation mesh
StructuredBuffer<Vertex> simMeshSkinnedVertexBuffer : register(t0);
// the skinning data from the previous frame
RWStructuredBuffer<Vertex> simMeshPreviousSkinnedVertexBuffer : register(u0);
// the output result stored as integers InterlockedAdd can be used to manage synchronisation
RWStructuredBuffer<InterlockedVector> simMeshTransformedVertexBuffer : register(u1);

float3 UnQuantize(InterlockedVector input, float factor)
{
    float vertexPositionX = ((float) input.x) / factor;
    float vertexPositionY = ((float) input.y) / factor;
    float vertexPositionZ = ((float) input.z) / factor;
    return float3(vertexPositionX, vertexPositionY, vertexPositionZ);
}

// Define the compute shader entry point
[numthreads(64, 1, 1)]
void CS(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    // Get the current vertex ID
    uint simMeshVertexID = dispatchThreadID.x;
    
    // Get the change in driving force from the relative transformation between frames
    float3 simSkinForce = simMeshSkinnedVertexBuffer[simMeshVertexID].position - simMeshPreviousSkinnedVertexBuffer[simMeshVertexID].position;

    // convert data to integer
    int quantizedX = (int) (simSkinForce.x * QUANTIZE);
    int quantizedY = (int) (simSkinForce.y * QUANTIZE);
    int quantizedZ = (int) (simSkinForce.z * QUANTIZE);
 
    // perform atomic addition
    InterlockedAdd(simMeshTransformedVertexBuffer[simMeshVertexID].x, quantizedX);
    InterlockedAdd(simMeshTransformedVertexBuffer[simMeshVertexID].y, quantizedY);
    InterlockedAdd(simMeshTransformedVertexBuffer[simMeshVertexID].z, quantizedZ);
    
    // store skinning as previous
    simMeshPreviousSkinnedVertexBuffer[simMeshVertexID] = simMeshSkinnedVertexBuffer[simMeshVertexID];
}

The purpose of this part of the code is to feed in the changes to the skinned mesh as a driving force in the mass spring simulation. This part seems to working well from what I can see.

I am using InterlockedAdd as I had a suspicion that I was having some synchronization issues.

This Pre Solve compute shader is dispatched like this:

const UINT threadGroupSizeX = 64;
const UINT threadGroupSizeY = 1;
const UINT threadGroupSizeZ = 1;
cmdList->Dispatch((ri->SimMeshVertexCount + threadGroupSizeX - 1) / threadGroupSizeX, threadGroupSizeY, threadGroupSizeZ);

The next part is were I am running into some issues.

struct Neighbours
{
    uint index[8];
};

struct RestConstraint
{
    float length[8];
};

struct InterlockedVector
{
    int x;
    int y;
    int z;
};

// The rest length of each neighbouring vertex
StructuredBuffer<RestConstraint> restConstraintBuffer : register(t0);
// The vertexID of each neighbouring vertex
StructuredBuffer<Neighbours> simMeshVertexAdjacencyBuffer : register(t1);
// The output result stored as integers InterlockedAdd can be used to manage synchronisation
RWStructuredBuffer<InterlockedVector> simMeshTransformedVertexBuffer : register(u0);

float3 UnQuantize(InterlockedVector input, float factor)
{
    float vertexPositionX = ((float) input.x) / factor;
    float vertexPositionY = ((float) input.y) / factor;
    float vertexPositionZ = ((float) input.z) / factor;
    return float3(vertexPositionX, vertexPositionY, vertexPositionZ);
}

// Iterating over every vertex in the simulation mesh
[numthreads(1, 1, 1)]
void CS(uint3 dispatchThreadID : SV_DispatchThreadID)
{
    int springIterations = 1;
    int neighbourCount = 8;
    uint vertexID = dispatchThreadID.x;
    
    // For the number of iterations
    for (int iter = 0; iter < springIterations; ++iter)
    {
        // For each neighbour of the base vertex
        for (int ni = 0; ni < neighbourCount; ++ni)
        {
            // Get the neighbours vertex ID & rest length
            uint neighbour = simMeshVertexAdjacencyBuffer[vertexID].index[ni];
            float neighbourLength = restConstraintBuffer[vertexID].length[ni];
            
            // Get the current position of the base vertex
            float3 vertexPosition = UnQuantize(simMeshTransformedVertexBuffer[vertexID], QUANTIZE);
        
            // if the neighbours vertexID is not the same as the base vertexID (This is used to fill the Neighbours values and provide and mechanism to identify them as void entries)
            if (neighbour != vertexID)
            {
                // Get the current position of the neighbour vertex
                float3 neighbourVertexPosition = UnQuantize(simMeshTransformedVertexBuffer[neighbour], QUANTIZE);

                // Calculate the displacement between the base vertex and neighbour vertex
                float3 neighbourSkinnedDisplacement = vertexPosition - neighbourVertexPosition;
                
                // Find the length of this vector
                float neighbourSkinnedLength = length(neighbourSkinnedDisplacement);
                
                // Calculate scale and correction vector
                float3 correctionVector = (neighbourSkinnedDisplacement * (1.0 - neighbourLength / neighbourSkinnedLength)) * 0.5;
                
                // convert the result to integers ( + & - )
                int quantizedX = (int) (correctionVector.x * QUANTIZE);
                int quantizedY = (int) (correctionVector.y * QUANTIZE);
                int quantizedZ = (int) (correctionVector.z * QUANTIZE);
                int invQuantizedX = (int) (-correctionVector.x * QUANTIZE);
                int invQuantizedY = (int) (-correctionVector.y * QUANTIZE);
                int invQuantizedZ = (int) (-correctionVector.z * QUANTIZE);
 
                // Offset the base and neighbour vertex by half of the correction vertex in opposing directions (spring satisfaction)
                InterlockedAdd(simMeshTransformedVertexBuffer[vertexID].x, invQuantizedX);
                InterlockedAdd(simMeshTransformedVertexBuffer[vertexID].y, invQuantizedY);
                InterlockedAdd(simMeshTransformedVertexBuffer[vertexID].z, invQuantizedZ);
                InterlockedAdd(simMeshTransformedVertexBuffer[neighbour].x, quantizedX);
                InterlockedAdd(simMeshTransformedVertexBuffer[neighbour].y, quantizedY);
                InterlockedAdd(simMeshTransformedVertexBuffer[neighbour].z, quantizedZ);
            }
        }
    }
}

As you can see this is a relatively standard mass spring setup, I know a lot of mass spring solvers don’t offset the base and neighbor vertices simultaneously, but I don’t have any double up constraints that are palindromes, as it is less efficient.

This code seems to be working reasonably well, the main issue, is that it only seems to run correctly with 1 thread… I assume that there is something that I am not doing correctly to ensure that this is working properly in parallel. I hoped that the InterlockedAdd function would ensure that the code would synchronized properly, but that does not seem to be the case :frowning:

The mass spring compute shader is dispatched like this:

const UINT threadGroupSizeX = 1;
const UINT threadGroupSizeY = 1;
const UINT threadGroupSizeZ = 1;
cmdList->Dispatch((ri->SimMeshVertexCount + threadGroupSizeX - 1) / threadGroupSizeX, threadGroupSizeY, threadGroupSizeZ);

Any help or pointers would be appreciated greatly! Im a bit lost as to what I should try next

Reference I used for the mass spring setup: Mosegaards Cloth Simulation Coding Tutorial | Visual Computing Lab

Full implementation: https://github.com/JChittockDev/Research/blob/main/OpenResearchEngine/Render/Manager/Render.cpp