If I may tempt you with a screenshot:
http://img849.imageshack.us/img849/322/particleengine.png
1 000 000 particles, multi-threaded, 65 FPS on a laptop i5-2410 at 2.7GHz, and still CPU-limited.
You have a long way to go, young padawan!
Any decent GPU can process 1 000 000 triangles at 60 FPS. The real problem is fill-rate. I draw my particles as simple points. However, once I go over a point size of 4 it starts being GPU limited. This is the same pixel area as a 4x4 quad. Point smoothing further increases this to a 5x5 quad to be able to do its anti-aliasing and increases the cost of each pixel because of the coverage calculations. On top of this we also have blending which further increases the cost of rendering each pixel slightly.
All in all: 5x5 quads = 25 pixels per particle. 25 x 1 000 000 = 25 000 000 pixels to process each frame. For reference a 1920x1080p monitor has about 2 million pixels. I’m pretty much filling all pixels of such a screen 12.5 times.
If I turn of point smoothing with a point size of 4 the GPU-load decreases a lot, and I can manage a point size of 6 at 68 FPS. 6x6 (square) points x 1 000 000 particles = 36 000 000 pixels per frame. GPUs are awesome! =D
Particle rendering benefits a lot from OpenGL 3.0 hardware or hardware supporting the extensions needed from OpenGL 3.0. More specifically you can render your particles as points and then expand your points to quads (triangle strips) in a geometry shader.
So far I’ve focused on particle count and how to increase it. Just remember that many optimization attempts become worthless if your particles simply have a too large pixel area. If you have a 50x50 smoke particle on the screen, they cover 2 500 pixels each if rendered as a quad. Divide the earlier 36 000 000 pixels per frame and we get around 14 400 particles per second. This is of course a very rough estimate, but no amount of geometry shaders or CPU multi-threading is going to increase performance in this case. The only real optimization that you can do is to use more vertices. Your smoke particle texture is most likely round to not give an impression of actually just being a square texture. By approximating a circle using 16-32 vertices you can reduce the pixel area to something closer to the circle area equation (A=PIr^2) instead of a square’s area (side^2). The same 50x50 smoke particle can be rendered as a circle with a radius of 25. 50x50 is still 2 500 pixels, but a circle with r=25 has the area (3.141525^2) which is equal to ~1964 pixels. Suddenly we can have around 18 000 particles in a single frame! And yes, the number of vertices increased by 16-32x, but remember that we just pushed millions of vertices when we rendered smaller particles! 18 000 particles, each made of 16 triangles is still only 288 000 triangles. Additionally, these circles can be rendered using instancing which prevents the CPU/bandwidth nightmare of having to replicate all the particle data for each vertex.
TL;DR:
- It’s possible to render millions of particles per frame at 60 FPS if they are small enough.
- The modern replacement for point sprites is a geometry shader that expands a point to a quad (a triangle strip with 2 triangles).
- Larger particles are severely fill-rate limited, so rendering them as approximated circles reduces the pixel area by about 20%.