Brilliant! With a bit of modification the binary heap worked perfectly.
I added a Map inside the binary heap to map userstates to nodes so that they can be rapidly found and removed, as well.
End result: A* now barely registers on the profiler 
Cas 
      
    Brilliant! With a bit of modification the binary heap worked perfectly.
I added a Map inside the binary heap to map userstates to nodes so that they can be rapidly found and removed, as well.
End result: A* now barely registers on the profiler 
Cas 
Looking at the API, that seems to be the case (actually there’s no reduceKey method, so you’d have to remove and re-insert). It’s still usable if you just place the duplicates on the queue instead of reducing keys though.
Anyway, you’re point that the stock implementation in the Collections framework isn’t so good is definitely spot on.
pjt33, please drop me a line so I can add you into the credits for the game for that bit of code :-*
Cas 
Haven’t read through all the text here, but here are my thoughts:
Try to prevent the game running A* by caching paths, sharing paths, etc. And if you really need to run A*, make it run fast.
Ok 
You can get amortised O(1) for everything using a suitably designed calendar queue, but a) the hidden constant matters; b) the complexity of the code (maintenance cost) matters; c) if it’s no longer a bottle-neck, why bother?
Indeed, the A* implementation is now so fast that I don’t really need to optimise it further, but I’m still getting 30fps on the crappy laptop when there’s 100 gidrahs attacking the base. 20% of the time is now spent rendering sprites and the rest is mostly native, and the pathfinding barely registers, but every little helps at this stage. To my mind we should easily be getting 60fps on a dual 1.6GHz Turion64, even if it does have a very slow graphics card in it.
We are already cacheing paths between two points in a hashmap (and all points in between too). I planned to put in the invalidation optimisations you mentioned already, but I suspect that’s not going to give me a huge speed increase any more.
I’ve considered using another thread to do the pathfinding in as well - but it probably only makes sense where there’s more than one physical CPU core. As it is, the OS and GC and OpenGL drivers are already using a fairly decent proportion of the other CPU, so after synchronizing the data between the two threads it might only prove to be a very small speed increase for a lot of extra complexity.
Thanks for all your help, guys 
Cas 