Sorry for creating yet another thread, but it’s a slightly unrelated question.
Again, I’m making a strategy game, and I’m wondering what kind of performance I should be expecting out of my game. My initial goal is to get it running with 1000 units on my laptop, which has an Intel Sandy Bridge i5-2410m running at 2.7-2.8 GHz during single-threaded apps (drops slightly for multithreaded apps).
Logic and rendering will be completely separated, with heavy use of interpolation for rendering, meaning that the FPS can be higher than the logic updating speed. I will also need some network communication before each logic update due to determinism, so I will target 30 updates per second for the logic thread to reduce the impact of network lag and reduce CPU load. That means that I have 33 ms to play around with for a single logic update. Rendering will mostly be 99% GPU-limited, so it won’t use the CPU much.
The following are the CPU features that are the slowest:
-
Unit movement and collision detection and handling
Units can move around and collide with wall tiles and other units. This is slow when many units are close to each other as I can’t quickly rule out as many collision pairs. Worst case scenario is bad, best case is very fast. -
Exploring
Units check their surroundings for tiles which have changed since they were last observed, and updates outdated tiles to the current version if the unit has a clear line of sight to the tile. This allows the map to change without every player knowing it unless they have vision over that place. This is pretty slow when there are many outdated tiles around the unit, but almost instant when there are none. Again, worst case scenario is bad, best case is very fast. -
Line of sight
This is the slowest part of my game. This is to determine which units are visible to each unit. This is done by first quickly ruling out far away units using a grid, then checking the squared distance between units against the vision range, and finally raycasting between them. This is fast when units are evenly spread out over the level, but insaaaaaaaaanely slow when many units close to each other or the vision range is too high, as the algorithm approaches O(n^2) performance.
All of them are based on areas around them, which is what makes them so slow in the worst case scenario (when all units are close to each other).
My current performance:
I haven’t implemented interpolated rendering yet, so all rendering is included in these performance numbers. This is just terrain rendering (tiles), fog of war (done on the GPU) and points representing units, so they do not impact the CPU performance at all (or at least not measurably). I measure performance in FPS, and let the game run as fast as it can.
Distance is measured in tiles.
With 1000 units, each with a collision radius of 0.5, a vision radius of 10, randomly spread out over the center 256x256 area of a 512x512 map and randomly moving around in this area:
- 90-95 FPS, depending on the random placement of walls and units.
With the same settings as above, but units randomly spread out and moving over the center 64x64 area of a 512x512 map:
- 28-35 FPS, VERY depending on the random placement of walls and units, as they almost always create a big mass in the middle of the map stuck between walls and other units.
With the same settings as above, but all units charging the center of the map creating a huge mass of units:
- 11-13 FPS.
Is this normal or acceptable? A normal scenario would be somewhere between the first and second scenario (the last one is very artificial).
Planned optimizations:
- Collision detection can be optimized a bit more by using a better algorithm than a grid to quickly rule out collisions. This will probably help pretty much in worst case scenarios.
- Exploring is pretty heavily optimized already, so it’s hard getting it even faster. I do need to redo my raycasting function, which is both slow and inaccurate at the moment, leaving some tiles unexplored while they should be visible and not be tested anymore.
- Line of sight could also use a better algorithm than a grid too, but it still won’t be able to cull off many units when they are grouped together in one big pile.
I have two final big optimizations:
- Only update when units move. This is very situational, but can be very good for static units. If both units in a collision pair hasn’t moved, no need to check for a collision. If a unit hasn’t moved and the map hasn’t changed, no need to check exploring. Line of sight is similar to collision detection. Sadly, this would do nothing for the “charging the center” scenario.
- Multithreading. All the above can be multithreaded per unit while still being deterministic. In a small hacked together test, line of sight testing got a 1.75x scaling with 4 threads on a hyperthreaded dual-core. Scaling is reduced by the CPU running at a higher clock when only a single core is used. However there are some parts that cannot be multithreaded (running scripts being the biggest one), but they are mostly constant time operations on each unit, so they don’t really have worst case scenarios.
I don’t know how much of a speedup these optimizations will give in total, but doubling the performance of the above three parts seems reasonable (assuming a dual-core).
TL;DR / Actual question:
I have 1000 units with a normal line of sight range and collision radius. I get 90 FPS in an optimal scenario with units evenly placed over the map, and 11 FPS with units placed very closely together and I have almost all CPU-heavy features implemented. I expect to be able to optimize this, partly with multithreading, to about twice this FPS on a dual-core. The minimum logic updating speed is 30 FPS. Is this good enough performance for a strategy game with 1000 units? How many units do commercial real-time strategy games support with good performance?