Quick 'n Dirty Verlet Fluids v0.1

The collision is both squares and spheres.

Every tick() this happens:
[x] a simple (empty!) rectangular grid (cellSize is determined) is created/reused
[x] the grid is filled, by taking the center of each drop, calculating the cell[ x ][ y ] where the drop should be added
[x] another cell2[ w ][ h ] is filled, and for each cell, it adds the current cell and all neighbouring cells
==> now cell2[ x ][ y ] has roughly ~9 times the cell[ x ][ y ] drop count
==> for each cell[ x ][ y ], we now have all potential colliding drops in cell2[ x ][ y ]
[x] traverse the grid, and for each cell[ x ][ y ] collide each drop with every drop in cell2[ x ][ y ], except itself
==> only this part is multi-threaded
[x] clear all drops from both cell[ x ][ y ] and cell2[ x ][ y ]

The grid.cellSize is adjusted automatically (in steps of 10%), by measuring its own performance. It basically does some trial-and-error and after N iterations, approaches an optimum (hopefully not a local optimum) and kinda stays there. When the location of large sums of particles will change, it is likely that the grid will also converge to another optimal cellSize.

Interesting, so how big is the grid, 1 cell per pixel? It appears that the droplets compress and end up in the same cell so I guess the cells are smaller than a pixel? How do you chuck the big boat in there which is obviously bigger than a cell?

Well, in each cell2[ x ][ y ], the following drops will be added:
cell[ x - 1 ][ y - 1] cell[ x + 0 ][ y - 1] cell[ x + 1 ][ y - 1] cell[ x - 1 ][ y + 0] cell[ x + 0 ][ y + 0] cell[ x + 1 ][ y + 0] cell[ x - 1 ][ y + 1] cell[ x + 0 ][ y + 1] cell[ x + 1 ][ y + 1]

The drops in cell[ x ][ y ] can only collide with the drops in cell2[ x ][ y ], so naturally, the cellSize must be greater than the dropRadius, because otherwise the potential colliders wouldn’t be in cell2[ x ][ y ]. The current dropRadius is 2,75. Typically, the cellSize ends up at 12…16 pixels (on my CPU, in this scene), which basically means that in cell2[ x ][ y ], the drops are gathered that are found in a (12…16)*3 = 36…48 pixel square.

Regarding the boat, it is not interacting with the liquid grid (cells) directly. I calculate the bounding sphere (rendered as a grey circle) of the boat. Then I query the grid for all cells that intersect that bounding sphere. I sorta took a (slower) shortcut, by calculating the boundingbox of the boundingsphere (is too big, I know), calculated the minimal x,y and maximal x,y cell coordinates in the grid, corresponding with that boundingbox, then subtract -1,-1 cell from the minimum, and add 1,1 to the maximum cell (adding the neighbouring cells), and grab all drops in that rectangular group of cells. Then I collide (pure Verlet) all drops (basically spheres) with all spheres in the boat.

I have a 5 year old laptop. It got an Intel celeron M 1.4 GHz CPU.
Framerate: 10-12

I also got a year old stationary. It got an intel core 2 duo 3.16 GHz.
Framerate: 35-39

Win7, JRE v1.6.0_15, Intel Q6600 (quad core, 2.4GHz), getting 50fps @ ~60% CPU usage.

OpenSolaris 2009.06 64-bit, Intel Core i7 (4 cores with Hyperthreading = 8 CPU), nVidia GTS 250 GPU

50fps at about 21% CPU

Thanks for the WebStart link :wink:

Between 50-35 fps average around 42

Core 2 Dui E7400 @ 2,8Ghz, Windows XP

About 40 - 45fps
CPU% = ~ 120% (linux style, looks like about 60% per core)

Linux CentOS 5.1
Intel® Core™2 Duo CPU E6550 @ 2.33GHz
NVIDIA GeForce 7600 GS
Java™ SE Runtime Environment (build 1.6.0_12-b04)
Java HotSpot™ 64-Bit Server VM (build 11.2-b01, mixed mode)

Intel Core 2 Duo 2.4 ghz, around 30 fps, about 95% of cpu. At one point I randomly was getting 2 or 3 fps, I think it was when the raft was buried.

Looking very good, I’m impressed you’re getting that many particles going at once without too much slowdown.

Going to release code, too?

asus laptop
intel core2Duo (2 cores) t7250 @ 2ghz
2gb ram
windows vista home premium

fps:
average = 33
minimum = 28

the weird thing is that even with these fps, i dont see it smooth. it looks laggy even if its always +28fps
my gpu is not top notch (ati hd 2400). cant play crysis but i can play oblivion and gears of war at decent fps (+30)

CPU : 70~90 % (thats a lot dude!!)

hope it helps

Well, it’s certainly a lot! There are 8K particles, so in theory, that’s 64 million collisions per frame. Due to the spatial algorithm, that gets reduced to roughly 1% (still 640K collisions or collision checks). You won’t see true fluid simulation in any modern game, because it’s just too darn heavy to compute, even simulating a glass of water is harder to realistically calculate (not to mention render) than Crisis.

I’m probably going to find some more shortcuts, maybe I can speed it up like 2-5 times, but it won’t get much better than that. At that time, I’m probably going to make the drops smaller anyway.

The gfx card is not used BTW, maybe by Java2D, but I’m mainly drawing rects and there isn’t exactly much room for improvement there, and the GPU won’t really accelerate that that much. It’s like OpenGL immediate mode with 1 quad in every begin/end at best, as every quad has it’s own color.

Maybe, the code is very simple actually, it’s just not really nicely coded.

Yesterday I optimized a fair bit - without touching the algorithms, and it seems to be 20% faster. Will upload later.

for those interested, this is the whacky ‘kernel’:


 public static final void collideLiquid(VerletSphere a, VerletSphere b, float viscosity)
   {
      final float rad = a.radius; // 'b' is just as big...
      final Vec3 aNow = a.p.now;
      final Vec3 bNow = b.p.now;

      final float dx = bNow.x - aNow.x;
      final float dy = bNow.y - aNow.y;
      final float dz = bNow.z - aNow.z;
      final float d2 = dx * dx + dy * dy + dz * dz;

      final float outer = (rad * 3.0f);
      if (d2 > outer * outer)
      {
         return;
      }

      float force = viscosity / d2; // decreases like gravity

      final float rad2_doub = rad * rad * 4.00f; // (rad*2)^2
      final float rad2_half = rad * rad * 0.25f; // (rad/2)^2

      if (d2 < rad2_doub) // d < rad*2
      {
         // prevent extreme collisions
         local: if (d2 > rad2_half) // d > rad/2
         {
            float diam = rad * 2.0f;
            if (d2 > (diam * diam))
               break local;

            float d = (float) Math.sqrt(d2);
            float f = (d - diam) / diam * 0.25f; // spring with half stiffness = 0.5*0.5

            aNow.x += dx * f;
            aNow.y += dy * f;
            aNow.z += dz * f;

            bNow.x -= dx * f;
            bNow.y -= dy * f;
            bNow.z -= dz * f;
         }

         // this is a collision, flip the viscosity force (makes no sense, but it works)
         force *= -0.5f;
      }
      else
      {
         // these drops are so far away, that viscosity shouldn't really have such a big influence...
         force *= 0.5f;
      }

      final Vec3 aOld = a.p.old;
      final Vec3 bOld = b.p.old;

      aOld.x -= dx * force;
      aOld.y -= dy * force;
      aOld.z -= dz * force;

      bOld.x += dx * force;
      bOld.y += dy * force;
      bOld.z += dz * force;
   }

I implemented something similiar and sped it up by

  1. precalculating the inverse radius
  2. using a lookup table for sqrt (interesting values are always in radius range)
    could help you too :slight_smile:

well, those are probably going to gain me a few percent, if any at all. not really worth the effort at the moment, as it’s not the bottleneck.

the spatial algorithms take 25%, the above code takes 45%, everything else takes the remainder.

I actually didn’t know you could use break and block labels like that in Java. Good to know.

This is the 2nd time I did that in over a decade. It stinks. Don’t do it.

what’s the min. required screen resolution?
Tried to give it a spin on my netbook, but can’t see much of the window area. (1024x600)

Of what I could see, it ran @ 8-10fps.

1024x768, but it is really meant for 1280x1024.

At the moment I don’t have the time to change much. I removed the raft for some obscure reason, so if I’d upload it now it would be kind of a boring demo, for everybody else :slight_smile: