The collision is both squares and spheres.
Every tick()
this happens:
[x] a simple (empty!) rectangular grid (cellSize is determined) is created/reused
[x] the grid is filled, by taking the center of each drop, calculating the cell[ x ][ y ] where the drop should be added
[x] another cell2[ w ][ h ] is filled, and for each cell, it adds the current cell and all neighbouring cells
==> now cell2[ x ][ y ] has roughly ~9 times the cell[ x ][ y ] drop count
==> for each cell[ x ][ y ], we now have all potential colliding drops in cell2[ x ][ y ]
[x] traverse the grid, and for each cell[ x ][ y ] collide each drop with every drop in cell2[ x ][ y ], except itself
==> only this part is multi-threaded
[x] clear all drops from both cell[ x ][ y ] and cell2[ x ][ y ]
The grid.cellSize is adjusted automatically (in steps of 10%), by measuring its own performance. It basically does some trial-and-error and after N iterations, approaches an optimum (hopefully not a local optimum) and kinda stays there. When the location of large sums of particles will change, it is likely that the grid will also converge to another optimal cellSize.