As designer of a highly scalable system I’m always trying to get good execution statistics data. Even from the frequent short test runs I’m doing during daily development. For instance, how long time have the simulation threads spent executing each simulation tick? How much time is spent persisting/loading data? What is the client IO volume?
For a long time I was manually adding statistics measurements to the code in the way lots of people do - calling System.currentTimeMillis() or nanoTime() at some execution points and logging the time spent. But this was cumbersome, unclean, ad-hoc and the results (lots of logged values) were difficult to interpret.
I finally ended up writing a relatively simple but effective and efficient statistics collector and printer. My objective was to emulate the logger in simplicity: Put the measurements to the collector in a single line just as you would put a string to the logger:
public class IOPacker {
public static final Logger LOG = Logger.getLogger(IOPacker.class);
public static final StatCompiler STAT = StatCompiler.getStatCompiler(IOPacker.class);
...
public void pack() {
long startTime = System.nanoTime();
...
STAT.putSample("Pack time [ms]", (System.nanoTime()-startTime)/1e6);
}
}
To output the compiled statistics, I execute this line before the program exits:
StatCompiler.logAllStats(LOG, Level.INFO);
The output for the above is:
Pack time [ms]: sum: 3 326; avg: 11,9 (sd +- 48,6); count: 279; min<med<max: 5,5 < 8,2 < 823,2
Just like for a logger, the implementation needs to be as fast as possible. This is not terribly hard. But then there is the concern for memory: in a long application run, the amount of held data shouldn’t grow indefinitely, even though we want the information to encompass the full run!
In this solution, the memory size grows with the order of magnitude between the smallest and the largest data point, but not with the number of data points. As long as there is a lower boundary on the smallest data point precision we care about, and since we only create buckets when a sample actually hits them, there rarely needs to be more than a few dozen buckets. In other words, after a warm-up period the memory usage for each metric is effectively constant regardless of program run length.
I wrote a longer blog entry about this here:
The code can be found at: