CPU counters via JNI

Sun’s OS, Solaris has some nice features built-in. One of my favourites is the ability to get CPU performance counter iinformation with built-in commands and libraries, which are also available to normal users.

Java doesn’t have a built-in way to access these, so I wrote a little C program which does, and a little Java class to access them via JNI.

Here’s some background references:

Performance Analysis and Monitoring Using Hardware Counters:
http://developers.sun.com/solaris/articles/hardware_counters.html

Optimizing Applications with Large Working Sets:
http://developers.sun.com/solaris/articles/optimizing_apps.html

One thing I like about using these counters is that the values are very precise and very stable. I’m only using the number of CPU cycles and instructions executed, but there’s many more counters available (for analysing floating-point or cache/memory usage), depending on what the CPU provides.

Here’s the output from analysing a simple program (adding two integer arrays), where samples were taken every iteration of a higher-level loop. The lower level loop does 2,000 iterations each time. These results are with Java 1.4.1 server on a 500MHz UltraSPARC IIe on Solaris 8. Support for a good variety of x86 CPU counters isn’t available until Solaris 10 though.

Loop Cycles Instrs wall-clock (ns) 0 708823 491297 1344732 1 656932 476504 1204708 2 646695 476503 1185966 3 698243 513165 1286524 4 686206 500540 2025927 5 693815 500516 1272467 6 665485 500529 1220026 7 479821 222572 27391653 8 58270 48423 120922 9 32648 48419 69021 10 32360 48419 68480 11 32268 48419 68119 12 32270 48419 68300 13 32438 48419 68660 14 32272 48419 68119

You can see here that when the compiler kicks in at loop #7 it doesn’t affect the cycles and instructions counts that much - this is because the compiler is on a seperate thread, and the counting is only being done for the user thread.

Anyway, this is in very early stages, but it works. First time I’ve done something with JNI or the CPU performance counter C-library.

If anyone is particularly interested I can post the code, but you’d need Solaris to run it. Last I heard, Linux doesn’t have an equivalent built-in, and I’m pretty sure Windows doesn’t. MacOS X comes with a program called “monster”, but it’s not installed as standard.

I’m doing this as part of an article analysing run-time optimisation aspects of JVMs (particularly Sun’s JVM).

[quote]Last I heard, Linux doesn’t have an equivalent built-in, and I’m pretty sure Windows doesn’t.
[/quote]
That’s a shame. I would need something similar for a future project of mine.

I’m going to make a programming game (Robot Battle, Robocode etc.), and I would need a way to monitor and control the CPU time that each thread uses. Would you know some way to do it?

Not in a cross-platform way (which is what you’re after). Most OSs simply don’t provide this kind of detail. Maybe the RT-Java JVMs would help you here, but they’d have rather specific requirements - eg Sun’s one requires Solaris (for its RT/fair-share scheduling).

If you want your robots to be able to run general-purpose Java bytecodes, I think you’re going to be stuck. You’ll have to restrict their environment somewhat. If you can resitrict the environment in the right way, then you could let normal thread scheduling handle the problem.

I’ve been using some JNI code to get the Pentium clock cycle counter for years. For so long in fact that the code for Microsoft’s RNI is in there too!

Note that this is running under Windows. It doesn’t actually make any use of OS features — the assembler instruction is embedded directly.

I found a piece of C code for reading the counter from a Pentium running Linux: http://www.ussg.iu.edu/hypermail/linux/kernel/9804.3/0562.html

// Read the Pentium TSC.
static inline u_int64_t
rdtsc ()
{
  u_int64_t d;
  // Instruction is volatile because we don't want it to move
  // over an adjacent gettimeofday. That would ruin the timing
  // calibrations.
  __asm__ __volatile__ ("rdtsc" : "=&A" (d));
  return d;
}

rdtsc (Read Time Stamp Counter) returns the number of clock cycles since the CPU was powered up or reset.

How to get an accurate timer by using the CPU clock cycle counter:
http://www.javaworld.com/javaworld/javaqa/2003-01/01-qa-0110-timing.html

How to get the time used by the current thread (somewhat inaccurate):
http://www.javaworld.com/javaworld/javatips/jw-javatip92.html

Now I just would like to find out a way to get the CPU time used by other than the current thread.

could you please post your solaris code? i would really appreciate.

thanks in advance
–mika

I found some more information.

The Java Virtual Machine Profiling Interface (JVMPI) is deprecated as of J2SE 5.0 and it is replaced by Java Virtual Machine Tool Interface (JVMTI).

JVMTI has a function GetThreadCpuTime, which returns the CPU time used by any thread (nanosecond precision, but not necessarily nanosecond accuracy).

Looks like all of my questions just got answered. :slight_smile: The single step events might also be useful for my purposes (if they do not slow down the program too much), but I’ll need to look more closely into that.

[quote]could you please post your solaris code? i would really appreciate.
[/quote]
You got Solaris, or want to adapt the code? btw, if you do have some non trivial suggestions/improvements, I’d appreciate you posting them back or something.

javacpc.c

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/time.h>
#include <inttypes.h>
#include <libcpc.h>
#include <errno.h>

#include <jni.h>
#include "CPC.h"

// http://developers.sun.com/solaris/articles/optimizing_apps.html
// http://developers.sun.com/solaris/articles/hardware_counters.html

cpc_event_t *before, *after;
long long *timings;
hrtime_t start;

cpc_event_t event;
int cpuver;
const char *setting;

JNIEXPORT void JNICALL Java_CPC_initCPC(JNIEnv *env, jclass obj) {
  before = (cpc_event_t *)calloc(0, sizeof(cpc_event_t));
  after = (cpc_event_t *)calloc(0, sizeof(cpc_event_t));
  timings = (long long *)calloc(0, sizeof(long long));
}

JNIEXPORT jint JNICALL Java_CPC_setupCPC(JNIEnv *env, jclass obj, jstring events, jint size) {
  free(before);
  free(after);
  free(timings);
  before = (cpc_event_t *)calloc(size, sizeof(cpc_event_t));
  after = (cpc_event_t *)calloc(size, sizeof(cpc_event_t));
  timings = (long long *)calloc(size, sizeof(long long));

  setting = (*env)->GetStringUTFChars(env, events, 0);

  if ((cpuver = cpc_getcpuver()) == -1) {
    printf("no performance counters\n");
    return 1;
  }
  printf("hardware identifier %d\n", cpuver);

  if (cpc_strtoevent(cpuver, setting, &event) != 0) {
    printf("ev : cannot measure %s\n", setting);
    return 2;
  }

  setting = cpc_eventtostr(&event);
  if (cpc_bind_event(&event,0) == -1) {
    printf("cannot bind lwp %d %s\n", _lwp_self(),strerror(errno));
    return 3;
  }
  return 0;
}

JNIEXPORT void JNICALL Java_CPC_recordBefore(JNIEnv *env, jclass obj, jint index) {
  start = gethrtime();
  cpc_take_sample(&before[index]);
}

JNIEXPORT void JNICALL Java_CPC_recordAfter(JNIEnv *env, jclass obj, jint index) {
  cpc_take_sample(&after[index]);
  timings[index] = gethrtime() - start;
}

JNIEXPORT jlong JNICALL Java_CPC_getSample(JNIEnv *env, jclass obj, jint index, jint event) {
  return after[index].ce_pic[event] - before[index].ce_pic[event];
}

JNIEXPORT jlong JNICALL Java_CPC_getNanos(JNIEnv *env, jclass obj, jint index) {
 return timings[index];
}

CPC.java

public class CPC {
    private static String eventSpec = null;
    private static int samplePairSize = 0;
    private static boolean setupOkay = false;
    private static int sampleIndex = 0;

    static {
        try {
            System.loadLibrary("javacpc");
            initCPC();
        } catch (Exception E) {
            throw new Error(E);
        }
    }

    public static boolean setup(String events, int size) {
        eventSpec = events;
        samplePairSize = size;
        int okay = setupCPC(events, size);
        setupOkay = (okay == 0);
        return setupOkay;
    }

    public static void sampleBefore() {
        if (!setupOkay) return;
        recordBefore(sampleIndex);
    }

    public static void sampleAfter() {
        if (!setupOkay) return;
        recordAfter(sampleIndex);
        sampleIndex ++;
        if (sampleIndex >= samplePairSize) sampleIndex = samplePairSize-1;
    }

    public static long[][] getSamples() {
        long[][] samples = new long[sampleIndex+1][3];
        for (int i=0; i<=sampleIndex; i++) {
            samples[i][0] = getNanos(i);
            samples[i][1] = getSample(i, 0);
            samples[i][2] = getSample(i, 1);
        }
        sampleIndex = 0;
        return samples;
    }

    private static native void initCPC();
    private static native int setupCPC(String events, int size);
    private static native void recordBefore(int index);
    private static native void recordAfter(int index);
    private static native long getSample(int index, int event);
    private static native long getNanos(int index);
}

compiling:

/usr/j2sdk1.4.2_01/bin/javah -jni CPC
gcc -O -fPIC -c javacpc.c -I/usr/j2sdk1.4.2_01/include/ -I/usr/j2sdk1.4.2_01/include/solaris/
ld -l c -l cpc -B direct -z defs -G javacpc.o -o libjavacpc.so

Example time.java

public class time {
    public static void main(String[] args) {
        int size = 15;
        boolean ok = CPC.setup("pic0=Cycle_cnt,pic1=Instr_cnt", size);
        if (!ok) return;

        int asize = 2000;
        int a[] = new int[asize], b[] = new int[asize];
        if (args.length == 0) {
            for (int i=0; i<size; i++) {
                CPC.sampleBefore();
                for (int j=0; j<a.length; j++) {
                    a[j] = add(a[j], b[j]);
                }
                CPC.sampleAfter();
            }
        } else {
            for (int i=0; i<size; i++) {
                CPC.sampleBefore();
                for (int j=0; j<a.length; j++) {
                    a[j] = a[j] + b[j];
                }
                CPC.sampleAfter();
            }
        }

        long[][] samps = CPC.getSamples();
        for (int i=0; i<samps.length; i++) {
            System.out.println(i+"\t"+samps[i][1]+"\t"+samps[i][2]+"\t"+samps[i][0]);
        }
    }
    private static int add(int a, int b) { return a+b; }
}

hmm, the Java code just doesn’t want to seem to want to indent properly. Oh well.