What kind of performance should I expect?

theagentd · December 18, 2011, 7:48pm

Sorry for creating yet another thread, but it’s a slightly unrelated question.

Again, I’m making a strategy game, and I’m wondering what kind of performance I should be expecting out of my game. My initial goal is to get it running with 1000 units on my laptop, which has an Intel Sandy Bridge i5-2410m running at 2.7-2.8 GHz during single-threaded apps (drops slightly for multithreaded apps).

Logic and rendering will be completely separated, with heavy use of interpolation for rendering, meaning that the FPS can be higher than the logic updating speed. I will also need some network communication before each logic update due to determinism, so I will target 30 updates per second for the logic thread to reduce the impact of network lag and reduce CPU load. That means that I have 33 ms to play around with for a single logic update. Rendering will mostly be 99% GPU-limited, so it won’t use the CPU much.

The following are the CPU features that are the slowest:

Unit movement and collision detection and handling
Units can move around and collide with wall tiles and other units. This is slow when many units are close to each other as I can’t quickly rule out as many collision pairs. Worst case scenario is bad, best case is very fast.
Exploring
Units check their surroundings for tiles which have changed since they were last observed, and updates outdated tiles to the current version if the unit has a clear line of sight to the tile. This allows the map to change without every player knowing it unless they have vision over that place. This is pretty slow when there are many outdated tiles around the unit, but almost instant when there are none. Again, worst case scenario is bad, best case is very fast.
Line of sight
This is the slowest part of my game. This is to determine which units are visible to each unit. This is done by first quickly ruling out far away units using a grid, then checking the squared distance between units against the vision range, and finally raycasting between them. This is fast when units are evenly spread out over the level, but insaaaaaaaaanely slow when many units close to each other or the vision range is too high, as the algorithm approaches O(n^2) performance.

All of them are based on areas around them, which is what makes them so slow in the worst case scenario (when all units are close to each other).

My current performance:

I haven’t implemented interpolated rendering yet, so all rendering is included in these performance numbers. This is just terrain rendering (tiles), fog of war (done on the GPU) and points representing units, so they do not impact the CPU performance at all (or at least not measurably). I measure performance in FPS, and let the game run as fast as it can.

Distance is measured in tiles.

With 1000 units, each with a collision radius of 0.5, a vision radius of 10, randomly spread out over the center 256x256 area of a 512x512 map and randomly moving around in this area:

90-95 FPS, depending on the random placement of walls and units.

With the same settings as above, but units randomly spread out and moving over the center 64x64 area of a 512x512 map:

28-35 FPS, VERY depending on the random placement of walls and units, as they almost always create a big mass in the middle of the map stuck between walls and other units.

With the same settings as above, but all units charging the center of the map creating a huge mass of units:

11-13 FPS.

Is this normal or acceptable? A normal scenario would be somewhere between the first and second scenario (the last one is very artificial).

Planned optimizations:

Collision detection can be optimized a bit more by using a better algorithm than a grid to quickly rule out collisions. This will probably help pretty much in worst case scenarios.
Exploring is pretty heavily optimized already, so it’s hard getting it even faster. I do need to redo my raycasting function, which is both slow and inaccurate at the moment, leaving some tiles unexplored while they should be visible and not be tested anymore.
Line of sight could also use a better algorithm than a grid too, but it still won’t be able to cull off many units when they are grouped together in one big pile.

I have two final big optimizations:

Only update when units move. This is very situational, but can be very good for static units. If both units in a collision pair hasn’t moved, no need to check for a collision. If a unit hasn’t moved and the map hasn’t changed, no need to check exploring. Line of sight is similar to collision detection. Sadly, this would do nothing for the “charging the center” scenario.
Multithreading. All the above can be multithreaded per unit while still being deterministic. In a small hacked together test, line of sight testing got a 1.75x scaling with 4 threads on a hyperthreaded dual-core. Scaling is reduced by the CPU running at a higher clock when only a single core is used. However there are some parts that cannot be multithreaded (running scripts being the biggest one), but they are mostly constant time operations on each unit, so they don’t really have worst case scenarios.

I don’t know how much of a speedup these optimizations will give in total, but doubling the performance of the above three parts seems reasonable (assuming a dual-core).

TL;DR / Actual question:
I have 1000 units with a normal line of sight range and collision radius. I get 90 FPS in an optimal scenario with units evenly placed over the map, and 11 FPS with units placed very closely together and I have almost all CPU-heavy features implemented. I expect to be able to optimize this, partly with multithreading, to about twice this FPS on a dual-core. The minimum logic updating speed is 30 FPS. Is this good enough performance for a strategy game with 1000 units? How many units do commercial real-time strategy games support with good performance?

Riven · December 18, 2011, 8:25pm

Cossacks, a game released in 2003 supported 64K units. There were often a couple of thousand on screen.

How did they do it? Simple: grouping the units, so they didn’t need individual logic/AI.

Having said that, there is absolutely no way to answer your question, other than, yes, optimize it even further, as long as it’s the bottleneck.

theagentd · December 18, 2011, 8:59pm

That game doesn’t have line of sight per unit or the same collision physics like I have. The grouping is interesting, but not really possible to any extent. I wanted it compared to similar games, not games that have extreme unit count… >_<

Riven · December 18, 2011, 9:12pm

Actually, that game has a line of sight per unit. It seriously prevents unit A from killing unit X (enemy), if unit B (ally) is standing in the line of fire. You can even disable it ingame, and you can see your army killing it’s own or allied forces.

Archibald · March 27, 2012, 8:34pm

Are you doing LOS every frame by any chance? I think once per 2-3 seconds (not frames, seconds) should be enough.

With 1000 units the worst case scenario would mean 1,000,000 rays… You could probably halve it by sharing the ray between 2 units (if I see them they see me - at leasyt for units with equal vsion range).

Maybe: If they are near (distance) they are always visible?

Does your units share vision (a unit see what all other units of the same team see)? Maybe you could find optimization here.

Damocles · March 27, 2012, 8:41pm

Using some propper precalculations you could reduce the costs for Line-Of-Sight to a tiny amount.

It depends more on how accurate it needs to be.

But in a mass-army game I would not push on overtly high accuracy.

theagentd · March 27, 2012, 11:35pm

It will be impossible to cram 1000 units into a that small space, so the worst case scenario is way under 1 000 000 rays. The near optimization is pretty much worthless, since if they are that close the raycasting will of course be fast. When it comes to shared vision I’m not sure yet. Knowing exactly which other units are nearby for each unit can allow for better AI, but I might just have to drop that. My idea was that this list could then be used for targeting both enemies and friendlies (for heals, buffs, e.t.c), but I guess this could be configured per unit. The thing is that much of the information gained from line-of-sight is needed for targeting (clear line-of-sight), not to mention the more realistic AI gained from having units actually having to see what they are targetting.

It needs to be accurate, since most of the game is pretty low-scale (think Warcraft 3). That number of units is pretty extreme, something I only expect on like 4+ player maps. Do tell how you thought about optimizing the LoS though. Oh, one more thing, the map is 100% dynamic. >_>’