Massive internal JEmalloc/Nvidia driver memory leak?

theagentd · January 4, 2016, 11:51am

I think I just found a massive bug in JEmalloc that is fairly complicated but 100% consistent to reproduce for me.

I’m allocating “pages” of vertex attribute data each frame and deallocating them again at the end of each frame. I have this little test program comparing my old and new rendering system, and if I start off at a specific one and switch to another one memory usage quickly rises until I’m out of 16GBs of RAM (takes around 15 second). The Java heap memory is constant as nothing is being allocated after the rendering loop has started. Since the memory usage goes far above my 2GB Java heap size limit, it must be JEmalloc.

I tried enabling the debug allocator which prints memory leaks on program exit. This is the result of a run where it runs until it crashes. Note that the OutOfMemoryError is thrown by my own code when je_malloc() returns 0.

Exception in thread “main” java.lang.OutOfMemoryError: failed to allocate memory
at engine.util.gl.gfx.SmartByteBuffer.reserve(SmartByteBuffer.java:51)
at engine.util.gl.gfx.Test2DShaderProgramAdvanced.render(Test2DShaderProgramAdvanced.java:32)
at engine.test.GLUtilTest.main(GLUtilTest.java:220)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1010
at org.lwjgl.system.libffi.Closure.(Closure.java:132)
at org.lwjgl.system.libffi.Closure$Void.(Closure.java:285)
at org.lwjgl.glfw.GLFWErrorCallback.(GLFWErrorCallback.java:41)
at engine.util.gl.glfw.WindowManager$1.(WindowManager.java:31)
at engine.util.gl.glfw.WindowManager.(WindowManager.java:31)
at engine.test.GLUtilTest.main(GLUtilTest.java:58)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1020
at org.lwjgl.system.libffi.Closure.(Closure.java:132)
at org.lwjgl.system.libffi.Closure$Void.(Closure.java:285)
at org.lwjgl.glfw.GLFWWindowPosCallback.(GLFWWindowPosCallback.java:35)
at engine.util.gl.glfw.Window$1.(Window.java:56)
at engine.util.gl.glfw.Window.(Window.java:56)
at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:186)
at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:154)
at engine.test.GLUtilTest.main(GLUtilTest.java:60)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1030
at org.lwjgl.system.libffi.Closure.(Closure.java:132)
at org.lwjgl.system.libffi.Closure$Void.(Closure.java:285)
at org.lwjgl.glfw.GLFWFramebufferSizeCallback.(GLFWFramebufferSizeCallback.java:35)
at engine.util.gl.glfw.Window$2.(Window.java:74)
at engine.util.gl.glfw.Window.(Window.java:74)
at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:186)
at engine.util.gl.glfw.WindowManager.createWindow(WindowManager.java:154)
at engine.test.GLUtilTest.main(GLUtilTest.java:60)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1040
at org.lwjgl.system.libffi.Closure.(Closure.java:132)
at org.lwjgl.system.libffi.Closure$Void.(Closure.java:285)
at org.lwjgl.opengl.GLDebugMessageARBCallback.(GLDebugMessageARBCallback.java:33)
at engine.util.gl.debug.DebugCallbackHandler.(DebugCallbackHandler.java:9)
at engine.test.GLUtilTest.main(GLUtilTest.java:63)
[LWJGL] 16 bytes leaked, thread 1 (main), address: 0x3E1050
at org.lwjgl.system.libffi.Closure.(Closure.java:132)
at org.lwjgl.system.libffi.Closure$Void.(Closure.java:285)
at org.lwjgl.glfw.GLFWKeyCallback.(GLFWKeyCallback.java:35)
at engine.test.GLUtilTest$1.(GLUtilTest.java:81)
at engine.test.GLUtilTest.main(GLUtilTest.java:81)

In other words, the only memory I’m leaking is internal memory allocated by the GLFW callbacks (I didn’t clean up my windows and callbacks properly since it died from an exception escaping main()).

I can only reproduce it when I start the program at a specific render test and switch to another one. Both use JEmalloc in the exact same, and I only get leaks when using them in this specific order.
If I start on a different renderer I can’t reproduce it.
If I start it on the first renderer and switch to the second one correctly, then back to the first one again the increase permanently stops even if I switch back to the one that triggered the leak before.
No leaks are reported by the debug allocator.

Will do more investigations and post the program so you guys can reproduce it soon.

theagentd · January 4, 2016, 12:42pm

I’ve eliminated all memory leaks (properly dispose window callbacks, error callbacks, etc). Memory usage is still rising.

New observations:

Memory usage slowly returns to normal after switching back to the first system.
After the leak starts, switching to good old glBegin()-glEnd() rendering makes the leak continue. This could be an Nvidia driver memory management bug. Could JEmalloc be interfering with it maybe?

SHC · January 4, 2016, 2:43pm

I have also seen the memory usage thing with LWJGL3, but I thought it might be something with my program. In my case, it went up 1 MB per 30-50 seconds, and once the memory is reaching ~200 MB it’s getting back at 2 or 3, might it be the garbage collector doing it’s work.

Spasi · January 4, 2016, 2:57pm

[quote=“theagentd,post:1,topic:56204”]
Do you use jemalloc directly? The debug allocator only tracks memory allocated via MemoryUtil (memAlloc(), memFree(), etc). Another advantage of using MemoryUtil is that you can easily switch to a different allocator with the Configuration.MEMORY_ALLOCATOR option.

Spasi · January 4, 2016, 3:41pm

Could you please try this jemalloc build? It includes this fix, which sounds relevant to what you’re describing.

theagentd · January 4, 2016, 4:52pm

Ah, snap. I am using JEmalloc directly. Will try MemUtil.

Sadly there was a change of plans and I won’t have access to a computer for a few days. I feel a bit bad for screaming wolf and then ditching you when come to help… ._.

@SHC: In my case I’m seeing ~1GB of memory allocated off-heap per second, with the Java heap staying constant.

theagentd · January 5, 2016, 4:44pm

Double change of plans! I had a chance to test it today.

I replaced all JEmalloc calls with MemoryUtil. I see the memory allocated each frame totalling around 20MB at worst, but not anything else. If I switch to glBegin()-glEnd() I get no memory leaks at all while memory usage goes up to 14GB in a few seconds and then it crashes.

EDIT: The updated version of JEmalloc changes nothing. The memory seems to be allocated by the Nvidia driver.

EDIT2: hs_err log when out-of-memory process crash occurred during glBegin()-glEnd(). Seems like the compiler thread of Java crashed the program: http://www.java-gaming.org/?action=pastebin&id=1400

KaiHH · January 5, 2016, 5:15pm

Stupid question: Do you actually call glEnd() or more importantly swap buffers?
I experienced something like this a while ago: When I did not swap buffers (but did proper glBegin/glEnd) then the driver would buffer up / delay rendering commands until it crashed. I do not know whether that was due to going out of memory, though.

theagentd · January 5, 2016, 5:25pm

No, it happens completely without glBegin()-glEnd() as well, but thanks for asking. Everything is rendering properly, but while memory is leaking CPU performance is inhibited.

VRAM usage is constant.
No new buffers are created after the first 6 frames, and they’re permanently mapped as persistent VBOs, so no memory leaking there (confirmed with my leakage detector as well).
JIT Compiler thread does not seem to do anything wrong when the crash occurs. It simply always triggers the out of memory crash for some reason.

EDIT:

Has a tendency to permanently break Aero on Windows… -___-

This is most likely an Nvidia driver bug. I’m betting it’s some kind of heuristic going AWOL.

Spasi · January 5, 2016, 6:06pm

Did you try Configuration.MEMORY_ALLOCATOR.set(“system”) to make sure this has nothing to do with jemalloc?

basil · January 5, 2016, 6:08pm

does it happen if …

you use only host-memory ? no mapped buffers at all, just passing the host-pointers.

i think you’re right. to me, it sounds like the driver is reallocating/moving/optimizing the shit out of the VBO’s, which is pretty much open to everything.

btw, did you check ARBDebugOutput ? at least for me it does tell about the optimisations (most time).

thedanisaur · January 5, 2016, 6:16pm

Have you tried rolling back the driver yet?

theagentd · January 6, 2016, 11:51am

Confirmed, the leak still happens with this set as the first line in main().

@basil_
I actually had those debug messages disabled since they’re so annoying, but I reenabled them. The only thing I see when I create and map my buffers is:

[LWJGL] ARB_debug_output message
ID: 131185
Source: API
Type: OTHER
Severity: DEBUG
Message: Buffer detailed info: Buffer object 6 (bound to GL_UNIFORM_BUFFER_EXT, usage hint is GL_DYNAMIC_DRAW) will use SYSTEM HEAP memory as the source for buffer object operations.
Stack trace:
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1329)
at engine.util.gl.debug.DebugCallbackHandler.invoke(DebugCallbackHandler.java:87)
at org.lwjgl.opengl.GLDebugMessageARBCallback.callback(GLDebugMessageARBCallback.java:43)
at org.lwjgl.system.JNI.callIPPIP(Native Method)
at org.lwjgl.opengl.GL30.nglMapBufferRange(GL30.java:1751)
at org.lwjgl.opengl.GL30.glMapBufferRange(GL30.java:1778)
at engine.util.gl.buffer.simple.PersistentMappedBuffer.setCapacity(PersistentMappedBuffer.java:68)
at engine.util.gl.buffer.simple.PersistentMappedBuffer.mapUnsafe(PersistentMappedBuffer.java:49)
at engine.util.gl.gfx.GLGraphics.uploadFrameData(GLGraphics.java:115)
at engine.test.GLUtilTest.main(GLUtilTest.java:228)
[LWJGL] ARB_debug_output message
ID: 131185
Source: API
Type: OTHER
Severity: DEBUG
Message: Buffer detailed info: Buffer object 6 (bound to GL_UNIFORM_BUFFER_EXT, usage hint is GL_DYNAMIC_DRAW) has been mapped WRITE_ONLY in SYSTEM HEAP memory (fast).
Stack trace:
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1329)
at engine.util.gl.debug.DebugCallbackHandler.invoke(DebugCallbackHandler.java:87)
at org.lwjgl.opengl.GLDebugMessageARBCallback.callback(GLDebugMessageARBCallback.java:43)
at org.lwjgl.system.JNI.callIPPIP(Native Method)
at org.lwjgl.opengl.GL30.nglMapBufferRange(GL30.java:1751)
at org.lwjgl.opengl.GL30.glMapBufferRange(GL30.java:1778)
at engine.util.gl.buffer.simple.PersistentMappedBuffer.setCapacity(PersistentMappedBuffer.java:68)
at engine.util.gl.buffer.simple.PersistentMappedBuffer.mapUnsafe(PersistentMappedBuffer.java:49)
at engine.util.gl.gfx.GLGraphics.uploadFrameData(GLGraphics.java:115)
at engine.test.GLUtilTest.main(GLUtilTest.java:228)

Nothing unexpected is printed when the memory leak starts and nothing at all is printed when the leak ends.

@thedanisaur
I have tried updating to the latest Nvidia drivers, but the problem remains (I had the previous version). I’ll try to downgrade when I get the time.

Icecore · January 6, 2016, 2:22pm

Am…
(2 days)

Any source code? ^^

or + saved Heap (or analyze Yourself HEAP, any java profiler - for strange numbers).

try localize bug with 1-2 “clean” source files for profiling and reproduce

p.s Stack trace - can’t help)
its show last Thread position - not bug (memory leak) call

pp.s sorry i fast read topic, can miss something, without source code it’s like guessing what i see now behind window

thedanisaur · January 6, 2016, 5:26pm

Yeah the rollback is going to be the important one in seeing if it’s a driver bug, assuming you updated and then the bug appeared (and they didn’t notice on the latest release).

Edit: I really don’t think this is a driver bug, if you don’t call genBuffers() every frame there’s no way that memory usage should increase due to mapping existing buffers. Once they’re mapped the graphics card doesn’t care what happens in RAM so it wouldn’t hold onto it. It’s far more likely that something is being missed with jemalloc, or rather how it’s being used.

theagentd · January 8, 2016, 10:12pm

Sorry for the long response time. Thanks for your responses.

@thedanisaur
The thing is that once the leak is triggered memory usage continues to rise even after triggering the bug and then switching to simple glBegin()-glEnd() rendering. In that case memory usage increases to around 14GB at which point it stops, i.e. the driver “gracefully” handles the fact that it cannot allocate more memory. The next time the Java compiler thread tries to compile something its memory allocation will fail and crash the process. I have confirmed that I do not use JEmalloc directly at any point in the program, so it can’t be JEmalloc, and since the debug allocator isn’t saying I’m leaking it’s probably not me. It’s not OpenGL buffers since I have code that scans buffers with glIsBuffer() and also queries the buffer size to calculate the total amount of memory allocated for buffers, which remains constant. Since the bug is maintained by simple calling glBegin()-glVertex2f()-glEnd() a few thousand times per frame followed by swapping buffers and handling GLFW events, the logical conclusion is that the driver is doing something.

I’m using the debugger in Eclipse to step through this loop one iteration at a time:

					for(int i = 0; i < numPoints; i++){
						Vector2f v = points[i];
						glBegin(GL_POINTS);
						glVertex2f(v.x, v.y);
						glEnd();
					}

And every 250 iterations or so the memory usage goes up by ~64 kbs or something it seems. Here’s an outline:

Start program at setting 0.
Switch to other renderer that writes a lot of uniform buffer data each frame. Bug is now activated.
At this point, every single OpenGL call seems to leak memory no matter which renderer is used besides 0.
Switching back to 0 causes the memory leak to permanently stop until the program is restarted, and memory usage slowly falls to the original value (~100MBs/sec until stabilizing at ~110MBs).

I’ve closed all programs that I believe could interfere, still happens.
Once again confirmed my memory usage is constant and that buffers are being reused each frame.
Threaded Optimization does not matter.
Unsynchronized or persistently mapped buffers make no difference.
While the bug is in effect FPS is reduced. When the bug is permanently ended by switching back to 0 FPS rises by around 40% for all renderers.

@Icecore and everyone else
I really don’t know where to proceed from here on. Is there any way to trace which library/.dll-file is allocating all this memory to confirm that it’s the driver? I’m not sure a driver rollback makes sense as this happened before I updated to the latest driver too (so confirmed for current and previous drivers). I am however gonna throw together an executable and go run it on my Intel GPU PC.

EDIT: CONFIRMED! DOES NOT HAPPEN ON AN INTEL GPU! Memory usage remained at a beautifully constant 65 MB even when I tried to trigger the bug. Also, Intel GPUs have a much better uniform buffer offset alignment of 16 instead of Nvidia and AMD’s 256, which means a lot less wasted memory on padding.

Icecore · January 9, 2016, 1:12am

Сreate the smallest source file you can
Post it - Wait same result from ppl
Send source to library developers causing current bug (LWJGl?) - Wait respund
Send source GPU driver provider

You can find Java library source code and trace in eclipse, for dll you can use visual studio.
(reverse library source)

for trace use in diff places

add before frame
System.gc();
and wait(1) - to give VM time clear memory

p.s

have no idea why you do this ^^
this looks like a bug itself (multi render context - looks for me unstable)

pp.s i think i understand what happening - when you switch render context,
and write data to first context - data is written to buffer (wait until switch back to send GPU)
because you not switching back - buffer grows

if i right - problem can be on top,
when creating second render context or swithing it,
during this process can be error - that you may forget to catch,
and using context - create some unusual behavior
(or maybe its “library opengl wrapper” mistake, to catch error)

HeroesGraveDev · January 9, 2016, 3:13am

On linux there’s valgrind for debugging these sorts of problems, but assuming you still haven’t got a linux box/VM up and running (also the bug might be a platform-specific thing anyway), that’s not exactly helpful.

A quick search on the internet turns up with https://github.com/dynamorio/drmemory, which at least has the memory-leak checking aspect.