JNI threading > JNIEnv memleaks

basil · June 29, 2015, 9:05pm

o/

i tried the googles but could not find information fitting my brain yet. maybe you guys can help me out.

until now the JNI adventures of mine were controlled by the host/java so threads where first spawned in java and i never had to fetch the JNIEnv from the VM. JNIEnv came from with the native call. now i started playing around with binding fmod to java.

fmod spawns a few threads when used, 4-5 (mixer, studio-update, async, stream, etc.). these threads do not see a JNIEnv when they start. there is also no way to queue code for me and care about “life-cycle”. when i hook up callbacks into those threads, fmod allows a few basic things like a system-callback (sometimes called form async c-thread) or dsp-processing-callbacks (called from mixer-c-thread) - i ran into something which i cannot get right. either i end up with a (small) memory leak or i get spammed by java-threads :

a system callback looks like this :

FMOD_RESULT __stdcall callback([...])
{
  return FMOD_OK;
}

to access java i’m grabbing the JNIEnv from the VM, which i fetch when [icode]System.loadLibrary()[/icode] is called :

static JavaVM *jvm;

JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM *vm, void *reserved)
{
  jvm = vm;
  return JNI_VERSION_1_6;
}

back to the callback :

FMOD_RESULT __stdcall callback([...])
{
  JNIEnv *jenv = (*jvm)->AttachCurrentThread(jvm, (void **)&env, NULL);

  // do things to java ...

  (*jvm)->DetachCurrentThread(jvm);

  return FMOD_OK;
}

i took a look into lwjgl 2.x code and found that this is similar to how the [icode]ARBDebugOutput[/icode] implementation works.

anyway, question is : does [icode]AttachCurrentThread[/icode] and [icode]DetachCurrentThread[/icode] use a reference counter ?

using this, assuming the JNIEnv reference count on that particular c-thread is zero, attaching creates a new java-thread (fair enough), and terminates it when detached (reference back to zero). calling that callback, say - every frame in the game loop, creates a new java-thread every time executed. which is … pretty bad.

so for now i simple do not detach, and live with a zombie java-thread, after the sound-system shuts down. usually you do not create more the 2 sound systems in a game anyway, but it can pile up to a nasty pack of zombies - after hot-swapping java code and restart the game-loop :

FMOD_RESULT __stdcall callback([...])
{
  JNIEnv *jenv = (*jvm)->AttachCurrentThread(jvm, (void **)&env, NULL);

  // do things to java ...

  // (*jvm)->DetachCurrentThread(jvm); // memleak

  return FMOD_OK;
}

what is bugging me, when i track the java-threads used by [icode]ARBDebugOutput[/icode] callbacks i dont see the same behaviour. is that due to the JNIEnv reference, which is not zero when entering the method ? are calls caused by ARBDebugOutput actually coming from the gl-thread (which is known and attached to java) ? afaik it is async, or just deferred ?

for now i end up testing if attaching would be required at all :

FMOD_RESULT __stdcall callback([...])
{
  int attached;
  JNIEnv *jenv;
  int env_status = (*jvm)->GetEnv(jvm, (void *)&jenv, JNI_VERSION_1_6);

  if( env_status == JNI_EDETACHED )
  {
     (*jvm)->AttachCurrentThreadAsDaemon(jvm, (void **)&jenv, NULL);
     attached = 1;
  }

  // do things to java ...

  if ( attached )
  {
    // still nothing. :|
  }

  return FMOD_OK;
}

i tried getting rid of the zombie threads from within java, but could not find any way to do so. the attached java-thread is not willing to interrupt or anything else. that is still good enough tho’, zombies are not that dangerous and the callbacks work just fine.

is there a way to tell the jvm to “reuse” threads when attaching a zero-reference c-thread (deattaching properly) ?
is there a way to tell the jvm to cleanup jni-attached java-threads ?
is this the right way to access java from c-threads ?
did i miss something ?

thanks

o/

trollwarrior1 · June 30, 2015, 5:47am

Are you sure that doing ->AttachCurrentThread spawns a new Java thread? I thought it only passes JVM context to your current thread, so that JNI calls are possible from that thread. At my work every call from native to Java (On android) tries to attach the thread if it is not yet attached. I think the code looks something like this:


JNIEnv* getJNIEnv()
{
	JNIEvn* pEnv = NULL;

	switch (jvm->GetEnv(pEnv))
	{
		case JNI_OK:
			// Thread is ready to use, nothing to do
		break;

		case JNI_EDEATCHED:
			// Thread is detached, need to attach
			jvm->AttachCurrentThread(pEnv);
			// AttachCurrentThread probably also returns control value, whether it was a success
		break;
	}

	if (pEnv != NULL)
	{
		// If not null, means ready to use
	}

	return pEnv;
}

Though this probably won’t compile, I didn’t write the code, not sure what are the exact parameters.

basil · June 30, 2015, 8:10am

aye, this is similar to how i use it now. in your andriod projects, do you use [icode]DetachCurrentThread[/icode] at some point ?

yes, i’m very sure attaching spawns a new thread. not if i do not detach at the end. seems like the AttachCurrentThread is about the current-c-thread, not the java one.

to be extra sure i track the thread id’s on both sides :

works for windows :

#ifdef WIN32
#include <windows.h>
unsigned long currentThreadID()
{
  return (unsigned long)GetCurrentThreadId();
}
#else
[...]

and then, setting the java thread name when attaching :

if( env_status == JNI_EDETACHED )
{
  JavaVMAttachArgs args;

  args.version = JNI_VERSION_1_6;
  args.group = NULL;

  char str[256];
  sprintf(str, "jni-attached-daemon-%lu", currentThreadID());

  args.name  = str;

  (*jvm)->AttachCurrentThreadAsDaemon(jvm, (void **)&env, &args);

   *attached = 1;
}

now on the java side i see lots of threads with the same name - matching what sprintf() does.
i use [icode]Thread.enumerate(Thread tarray[])[/icode] or the keys from [icode]Thread.getAllStackTraces()[/icode] to list the java threads.

Roquen · June 30, 2015, 8:45am

Aside: don’t know fmod, but it seems like the java side would only need to know about 1 thread…the one it interacts with.

basil · June 30, 2015, 9:15am

yes, java should not care about that at all. i do cos i’m running into [icode]OutOfMemoryError[/icode]s (after creating thousands of short-living threads) and try to fix it.

again, deattaching the JNIEnv disposes the java-thread, just as expected. not in the case of [icode]ARBDebugOutput[/icode] - where i do not know if its callback are really executed in a extra native thread. attaching multiple times increments a counter ?

in the case of fmod, the threads are just “normal” threads like pthread_t on Linux etc. or HANDLE’s on win32, nothing fancy. running in the background, calling functions (which eventually end up calling a java methods).

nsigma · June 30, 2015, 9:52am

TL;DR - switch to using JNA

OK, I did a load of research to help fix this issue in JNA. This now uses thread-local storage (pthread_key_create or TlsAlloc) to cache a reference to the JNI info for an attaching thread, and makes use of the TLS destructor to detach the thread from the JVM just before the thread dies. This behaviour isn’t JNA default, but has to be requested - I did some work on it because of a similar issue in binding to a native audio library (JACK audio server <> JNAJack)

JNA code for this is included in here - https://github.com/twall/jna/blob/master/native/callback.c

Incidentally, the complexity of that code is one reason I’m quite in favour of projects like JNA - fix the problem in one cross-platform library rather than many!

@trollwarrior1 mentioned Android - if this is just for Android then it’s easier because you don’t have to handle the Windows side and pthread TLS is simpler. This is mentioned in the Android docs (last paragraph) - http://developer.android.com/training/articles/perf-jni.html#threads

btw - the Thread objects created are very lightweight. On desktop I did some benchmarking, but while there was a significant performance hit with audio in JNAJack, there was minimal with video in GStreamer-Java - but there’s the difference between 24fps and 600+!

Roquen · June 30, 2015, 9:58am

Forgive me if i seem like I’m being a pain. How are thousands of short-lived threads being created?

nsigma · June 30, 2015, 10:00am

For thread read Thread. This issue is caused by the creation of thousands of Thread objects that proxy the same (or a few) native threads.

basil · June 30, 2015, 12:11pm

yes, “proxy” java-Thread Object reflecting a single native thread.

attach -> create java Thread object -> access JNIEnv -> detach -> garbage java Thread object. while perfectly fine for a long-living native thread, doing that for something like DSP processing (running at ~60 hz) and not being able to hook into the live-cylcle (and detach once finished) : not good.

i guess the overhead/weight of the java Thread object is not thaaaat bad, yet sort of unnecessary and desirable to avoid all together. i should not run into [icode]OutOfMemoryError[/icode]s at all. that’s probably cos’ i hold on that objects too much when observing them.

thanks for the heads up nsigma, will read into your work there. looks interesting!

Spasi · June 30, 2015, 12:22pm

ARB_debug_output and KHR_debug by default support asynchronous callback invocations. That is why LWJGL 2 uses Attach/Detach. You can also glEnable(GL_DEBUG_OUTPUT_SYNCHRONOUS) which will force debug output to run on the same thread that invoked the OpenGL function that caused the error (before that function returns). This obviously has a performance impact, but is probably acceptable for debug functionality.

LWJGL 3 uses the same code for all callbacks (via libffi). Since a callback can be either synchronous or asynchronous, GetEnv is first used, then AttachCurrentThreadAsDaemon. The result is stored in thread-local-storage, like JNA, and subsequent invocations use that. This avoids some minor overhead from calling into the JVM. DetachCurrentThread is never called; the expectation is that native threads will be few and as long-lived as the application.

AttachCurrentThread (source) is expensive. If you call DetachCurrentThread on every invocation, you will pay that price every time (about 100 microseconds on my machine). The good news is that the JVM uses TLS caching too; multiple calls to AttachCurrentThread will be cheap. There is no reference counting, a single DetachCurrentThread will purge the cache.

basil · June 30, 2015, 12:41pm

thanks for clarifying that Spasi. happy to hear it’s implemented like that in lwjgl 3.x.

when i look on the old lwjgl 2.x code like https://github.com/LWJGL/lwjgl/blob/master/src/native/common/opengl/org_lwjgl_opengl_CallbackUtil.c#L71 … (sorry i still didn’t port to 3.x, i cant let the swing go >.<)

… shouldn’t that create lots of java-threads-objects too ? - if running async, from a native thread which is unknown to java. i’m a bit confused, my tests show that this method do not behave like that - but that’s the same behaviour which occurs when the callback is executed form a “already attached” thread (not async). even if i detach a attached thread, it seems not to dispose the java thread object and still reuse it on the next attach - that is not really possible when detaching purges the cache. so confusing. :persecutioncomplex:

anyway. i will also stick to the exepectation of having just a few native threads too and not detach at all - for now. pthread_key_create/TlsAlloc seems to be what i was looking for.

thanks for the help

nsigma · June 30, 2015, 2:36pm

But not AFAIK automatic thread detachment using TLS destructors, or a least it didn’t, which is a shame!

That’s a bit of an arbitrary viewpoint, although it might be true for the majority of use cases with LWJGL. Using a native media library for video inter-titles is one obvious case where it might not be though, as it’s likely to start a new media callback thread each time.

Spasi · June 30, 2015, 3:05pm

[quote=“basil,post:11,topic:54909”]
The debug output callbacks may be called asynchronously, but the spec does not enforce this as a requirement. For example, my AMD driver does not seem to support async callbacks, it behaves like GL_DEBUG_OUTPUT_SYNCHRONOUS is always enabled. So, the native thread is a Java thread and Attach/Detach don’t really do anything. This may be what you’re seeing; try printing the thread id to verify.

[quote=“nsigma,post:12,topic:54909”]
Indeed, it does not detach threads automatically.

[quote=“nsigma,post:12,topic:54909”]
Yes, it’s arbitrary and one of the pending issues for 3.0. Current testing (even tried thousands of OpenCL native kernels that spawn several threads in the driver) has shown no measurable leaks, though different use cases may prove otherwise. I have implemented JNA’s TLS cleanup, but I couldn’t get it to trigger before JVM exit with the current LWJGL bindings.

trollwarrior1 · July 1, 2015, 4:53am

Yes, DeatchCurrentThread is being called after every call to Java.


class JNIThread:
{
	public:

	~JNIThread()
	{
		jvm->DetachCurrentThread();
	}

	bool Attach()
	{
		switch (jvm->GetEnv(m_pEnv))
		{
		  case JNI_OK:
		     // Thread is ready to use, nothing to do
		  break;

		  case JNI_EDEATCHED:
		     // Thread is detached, need to attach
		     jvm->AttachCurrentThread(m_pEnv);
		     // AttachCurrentThread probably also returns control value, whether it was a success
		  break;
		}

		if (m_pEnv != NULL)
		{
		  // If not null, means ready to use
		}

		return m_pEnv;
	}

	JNIEnv* m_pEnv;
}



// Using
funcion
{	
	JNIThread thread;	

	if (thread.Attach())
	{
		// do work
	}

	// thread is detached in destructor
}

Spasi · July 3, 2015, 6:28pm

[quote=“Spasi link=topic=36347.msg344616#msg344616 date=1435676730][quote author=nsigma,post:13,topic:54909”]
Yes, it’s arbitrary and one of the pending issues for 3.0. Current testing (even tried thousands of OpenCL native kernels that spawn several threads in the driver) has shown no measurable leaks, though different use cases may prove otherwise. I have implemented JNA’s TLS cleanup, but I couldn’t get it to trigger before JVM exit with the current LWJGL bindings.
[/quote]
Worked a bit more on this. Tested on an Nvidia GPU, still no luck. Funny how it spawns a silly amount of threads (with threaded optimization on in the driver settings) and never kills them. So, had to go to native code to manually spawn/attach/exit a native thread. You were right, I have now added automatic thread detaching to LWJGL.