Ok to assume Instrumentation#retransformClasses(Class....) is synchronous?

Abuse · May 17, 2015, 1:36pm

I’m using this method to read back the byte[] representation of loaded classes.
(for read-only instrumentation requirements, not to actually modify)

The reason I need the in-memory representations (rather than simply grabbing them from the classpath) is 2-fold:

some of them will have already been instrumented
some of them are defined at run-time, so do not exist in the classpath.

However the javadoc on the method is rather lacking on implementation details.
The Oracle VM certainly executes it synchronously, which means this, while conceptually horrible, works fine:

	private byte[] classBytes;

	/**
	 * Reads back from the JVM the byte[] representation of an already loaded class 
	 * 
	 * @param className
	 *            The name of the class to read back.
	 * @param cl
	 *            The class loader used to load the class.
	 * @return
	 */
	synchronized byte[] getClassBytes(String className, ClassLoader cl) {
		try {
			// trigger a retransformation (that will store the class's bytes in
			// classBytes[]
			inst.retransformClasses(cl.loadClass(className));
			byte[] classBytes = this.classBytes;
			this.classBytes = null;
			return classBytes;
		} catch (ClassNotFoundException | UnmodifiableClassException e) {
			throw new RuntimeException("Failed reading back class bytes from the JVM", e);
		}
	}

	@Override
	public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) throws IllegalClassFormatException {

		if (classBeingRedefined != null) {
			// this will only be reachable immediately after an invocation of
			// getClassBytes.
			// fingers crossed all implementations execute this synchronously.
			// Oracle's VM certainly does, but the method that causes the call
			// (Instrumentation#retransformClasses)
			// is not very explicit on exactly how it's implemented.
			System.out.println("byte[] read-back for " + className);
			classBytes = classfileBuffer.clone();
			return null;
		}
               //<snip>
	}

Abuse · May 17, 2015, 3:59pm

The alternative to this approach would be to store every “classfileBuffer” passed into transform(…).
However I’m loath to take this approach, because it might result in a massive increase memory usage. (potentially doubling the memory footprint of the program’s bytecode)

Riven · May 17, 2015, 5:48pm

A ‘massive’ increase in bytecode footprint sounds like a misnomer. It’s doubling a number that’s typically tiny compared to the application. There is also no need to keep it all in memory. Once the class has been transformed, you can remove it from your collection of bytecodes that are loaded-but-not-redefined yet. With this approach it would be surprising to have more than a dozen ‘pending’ classes at a time in memory, on even the worst behaving VMs. So either way it seems trvial to support this - were reduction of memory footprint seems just as trivial, yet highly likely premature optimisation.

Abuse · May 17, 2015, 6:30pm

For my use case the transformed bytecode needs to be kept around so that future instrumentations have access to it (i.e. they may need to interrogate the instrumented classes as part of their own transformations [Especially relevant due to Javassist’s tendency to check the validity of every type referenced in code fragments that it compiles]).

You’re right though, doubling the memory footprint of the code is probably not that enormous… though when it includes every class that the application touches within every library dependency too? (including the Java standard libraries!)
I can see that adding up to at least a few dozen MB’s.

Atm the discussion is moot though, as I’ve encountered a bug relating to this very piece of code… seems I’ve misunderstood precisely what retransform does.

:edit:

I thought the byte[] returned by transform(…) overwrote the VM’s internal classfileBuffer.
Apparently it doesn’t, it must be stored somewhere else alongside it.
So when retransformClasses(class) is called, the 2nd invocation of transform(…) is passed the original classfileBuffer. (not the transformed one that had been returned by the previous call to transform(…)).

So looks like I need to keep a copy of all transformed classes anyway, as this information cannot be read back from the VM.

Riven · May 17, 2015, 6:38pm

I was already a tad confused about your code snippet… I proposed the pending-list/map as your current approach was pending-item, and needed support for out-of-order invocations.

Either way, I’m still not sure a couple dozen MB warrants any optimisation. If it somehow does, however, GZ and/or serialize to disk. That should be enough of a scare to reconsider your tactics