My idea is to just dump all kinds of particles into a huge list and update them on the GPU. The update shader is an uber-shader which allows for lots of particle types, including emitter particles etc. Since the particles are sorted by distance, they will be somewhat grouped together by type since they’re emitted from the same place so the branching will be relatively cheap. It’s also worth noting that transform feedback has relatively high bandwidth cost for each vertex processed, so adding more work to the update shader won’t affect performance at all in my tests. If I’m right, multiple vertex streams will also allow me to do frustum culling (just 6 dot-products) for multiple lights in one pass and output the indices to other buffers in the same pass.
My current sorting algorithm is an abomination and I really need a more optimized one. To still do this with transform feedback (instead of for example OpenCL or CUDA) I really need support for multiple vertex streams = OGL4 to direct particles into buckets. I’m currently forced to do one pass over all visible particles for each bucket, which means that I have to do twice as many passes as the bit precision of the depth. With multiple vertex streams, I could sort 4 bits per pass using 16 buckets and reduce the number of passes from 48 to 6 for 24 bits of depth, or just 4 buckets and get it done in 12 passes (if VRAM is a limitation). Like I said before, transform feedback is very memory limited, so this is the main bottleneck at the moment.
I’ve taken a look at fourier opacity mapping, and it seems to be an excellent way of doing particle shadowing and self-shadowing. Performance seems good since the resolution of the map can be kept very low while still giving a very good look thanks to the blurry nature of particles. The particles also do not have to be sorted when rendering the opacity map. My only problem is that I have absolutely no idea how it works. That kind of math goes waaaaay over my head. It’s definitely somewhere in my todo list though.
Since my current particles are meant to simulate smoke I also had a go with fragment limitations. To get good looking smoke you need a lot of overdraw, and with the current 2 megapixel screens that becomes very expensive. Some games render the particles at half to reduce the number of pixels drastically. Using a special upsampling filter they can preserve sharp edges. Although the particles get slightly blurry, there’s not much of a different since particle effects are inherently blurry. The only artifact possible are single-pixel errors that won’t be visible at all. For 4 times as much overdraw I’d say it’s definitely worth it.