Logging vs Tracing

Some notes on the differences between these two concepts.
[]Logs are read by administrators and maybe even users, while tracing output is strictly for developers and maintainers. Thus localisation is relevant for logging but not for tracing.
[
]For enterprise class applications it is probably desirable to be able to change logging levels without forcing a restart. For tracing on the other hand being able to change trace level with a mere configuration file change and a restart is manna compared with the recompiles required by C/C++ etc.
[*]Log events are relatively short notifications of an event. In tracing however it may be desirable to include a lot of information on the current state of some object(s). In my case this extra data can run to hundreds of lines for a single ‘event’ The result would be unreadable if every line carried the sort of prefix found in Java logging.

In both cases if you have to minimise the cost you need to do something like

private static final boolean TRACE = ...;
...
if (TRACE)
   tracelog.println(...);

or (for logging)

if (logger.isLoggable(Level.FINE))
   logger.log(...);

Only in rare cases would the difference between these two be significant. The ability to remove the trace code altogether might allow the JIT to uncover further optimisations which would not be found in the logging case (where the code remains even if not executed). We have always had to do something to mark out our trace code in cases where performance matters, the if tests here are similar in size to the code we have always used. We can even do something like this:


private static final boolean LOG_FINEST = Logger.getLogger("myname").isLoggable(Level.FINEST)

Any code which depends on LOG_FINEST will only be executed if that condition was true at the time the code was loaded, which is not entirely in the spirit of the logging api, but so what!

I’ll add some more thoughts to this thread later.

This definitely works, does it? (on a JIT/hotspot recompile)

It certainly ought to, AFAICS, but I’ve learnt to assume that even server VM doesn’t do the obvious best thing it should until concretely proved otherwise :). On the day that javac finally allows full constructor-constructor delegation I will start to gain some trust for Sun’s JVM’s ;D…

[quote]Some notes on the differences between these two concepts.
[list][]Logs are read by administrators and maybe even users, while tracing output is strictly for developers and maintainers. Thus localisation is relevant for logging but not for tracing.
[
]For enterprise class applications it is probably desirable to be able to change logging levels without forcing a restart. For tracing on the other hand being able to change trace level with a mere configuration file change and a restart is manna compared with the recompiles required by C/C++ etc.
[/quote]
Out of interest, with MMOG platform development, we’ve tended to see a different pattern:

  • the “users” and admins are hardcore developers already (unlike many J2EE developers, for instance), and so everyone (both end-users and the original developers / vendor support team) wants to see the same stuff, i.e. everything

  • very few things that happen are worth turning on logging after the event. If logging wasn’t on when it happened, it usually isn’t going to happen again anyway, unless you are a developer working on a private build that hasn’t passed the test suite yet (in which case you often have recurring bugs). So, it seems to work the other way around: customers are generally happy with restart and edit log config files, whilst core developers want mid-runtime re-configuration.

(although, of course, some customers end up doing extensive enough development that they are hugely customizing the enterprise platform, and so move towards the postiion of the core develoeprs. Unlike with J2EE etc, customer-customization of the platform is not only easy but actively encouraged)

Whereas with a mail server, the event might be someone trying to hack into your system or an attempt at using your system as an open relay. In this case it will invariably happen again (usually quite soon), and it would be useful to turn on logging in order to figure out just what the evil one was attempting.

As for testing the effect of static final boolean values on the JIT, just need a trivial test application and the JVM parameter which dumps the assembler of generated code.

Try the following code under both client and server. In both cases the time is the same (apart perhaps from the first iteration) regardless of whether the value of TRACE is computed at runtime or compile time. In the case of the server JVM, it is clearly able to eliminate the entire test.
Also try the code with ‘final’ removed; it is much slower.


public class TestFinalBoolean
{
      //private static final boolean TRACE = false;
      private static final boolean TRACE = Boolean.getBoolean("TRACE");
      private static int z;

      public static void main(String[] args)
      {
            for (int i=0; i<10; i++)
            {
                  long t = System.currentTimeMillis();
                  test();
                  t = System.currentTimeMillis() - t;
                  System.out.println("t="+t+", z="+z);
            }
      }

      private static void test()
      {
            for (int i=0; i<30000; i++)
                  testInner();
      }

      private static void testInner()
      {
            for (int i=0; i<30000; i++)
            {
                  if (TRACE)
                        z++;
            }
      }
}


When I first implemented tracing for Java, a big problem was the fragility of the JVM. Any method using pure Java would either lose the vital information near the end of the log when the VM crashed, or if the file was flushed after every event, would be unusably slow.
The solution I used was to use a separate process to collect the trace, and (via JNI) a 1MB shared memory area to provide the communication between the two processes. Then if the test process crashed, the trace receiver process could still recover the last of the trace information from the shared memory area. As a separate process, the trace receiver can write the trace data to file using large buffers and without regular flushing. This allows large amounts of trace to be written with minimum overhead.
At that time I also tried Sockets as a communication method, but they also lost any unflushed data when the process crashed.

Today, in my experience, entire JVM crashes are much rarer. Individual threads may deadlock or crash, but other (independent) threads can continue without problems, especially if they don’t allocate memory. It is now also possible to use shutdown hooks to ensure that the tail of the trace log isn’t wasted by someone calling System.exit unexpectedly. Thus it should now be practical to move the trace receiver ‘in-process’, with flushing limited to say every 5 seconds. Then provided you don’t externally terminate an errant process too quickly, all relevant trace should be available. Flushing every 5 seconds (or so) would not cause significant io overhead, and will allow nice big buffers to be used for the trace io. If you want to send the trace data over sockets, then you still need a receiver thread to buffer the data and insert regular flushes of the socket.

I notice that java.util.logging.FileHandler calls flush after each event is published. Unfortunately I don’t think this is sufficient to ensure data is not lost (at least on Windows). On Windows you have to ensure that metadata is flushed as well, by using either FileDescriptor.sync or FileChannel.force(true). Doing this directly in FileHandler would be very expensive, but if the actual writing took place in a separate thread, flushing could be less frequent.