OpenGL ES glReadPixels() blocking on PBO on Android

DrHalfway · March 20, 2017, 11:33pm

Greetings JGO Community,

I’m stuck on a particular problem regarding glReadPixels and OpenGL ES.

I’m having an unusual problem while working on an openGL project. Essentially I require frame data in GRAYSCALE single channel format for some CV stuff. I’m using a custom shader, an FBO and PBO’s to get the task done. The data I’m rendering is the camera view from Android.

The flow of the program is as follows.

bind the generated FBO
draw() to the FBO
bind PBO and glReadPixels()
bind PBO from previous frame and glMapBufferRange()
process the provided pixel data from glMapBufferRange()

I’d like to actually confirm that the process is working fine. What i’d like to know is whether there is anything that can be done to increase the performance. I’m going to post some of the code I’m using so we can all follow.

The PBO generator code

public void setupPBO() {
        final int[] pbuffers = new int[2];

        GLES30.glGenBuffers(2, pbuffers, 0);

        for (int i = 0; i < pbuffers.length; i++) {
            GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbuffers[i]);
            GLES30.glBufferData(GLES30.GL_PIXEL_PACK_BUFFER, width * height, null, GLES30.GL_DYNAMIC_READ);
            GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
        }

        pbo_id[PBO_PRIMARY_ID] = pbuffers[0];
        pbo_id[PBO_SECONDARY_ID] = pbuffers[1];
}

The PBO Bind/Read Code. Before this call is made, I bind the FBO which was rendered into from the previous frame.

public void bindReadSwapPBO() {
        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbo_id[currentBuffer]);
        GLES30.glReadBuffer(GLES30.GL_COLOR_ATTACHMENT0);

        // glReadPixels is done from the JNI layer. Only read single channel GL_RED
        // This blocks for up to 50ms. Should be an Async call?
        JNI.glReadPixels(0, 0, width, height, GL_RED, GL_UNSIGNED_BYTE, 0);

        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);

        final int prevBuffer = previousBuffer;

        previousBuffer = currentBuffer;
        currentBuffer = prevBuffer;
}

This code is what handles grabbing the data from the PBO. Can confirm that this works properly and the call is virtually 0ms.

public void bindMapPBO() {
        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, pbo_id[currentBuffer]);

        // read our data from the PBO.
        JNI.glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height, GL_MAP_READ_BIT);

        GLES30.glUnmapBuffer(GLES30.GL_PIXEL_PACK_BUFFER);
        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);
}

And this is where the performance problem is coming from. Currently I’m reading back pixels which are 480 x 360 single channel grayscale (calculated from a shader). I’ve ran some benchmarks and results are below.

40-50ms -> JNI.glReadPixels(0, 0, width, height, GL_RED, GL_UNSIGNED_BYTE, 0);
0-1ms -> JNI.glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height, GL_MAP_READ_BIT);

From what I understood is that glReadPixels from the PBO is not meant to be a blocking call, but for whatever reason it’s blocking it here (and performing far worse than just reading from an FBO). It seems glMapBufferRange is behaving as expected, and returning the required data properly.

The only thing i can think of is that I’m using GL_RED and only reading back a single channel, but this still doesn’t explain why glReadPixels is blocking.

Devices I’ve used for bench-marking (consistent behaviour).

HTC One M8s (40-50ms)
Nexus 5x (20-30ms)
Google Pixel (15-30ms)

public void setupFBO() {
        final int[] values = new int[1];
        GLES30.glGenTextures(1, values, 0);
        GLES30.glBindTexture(GLES30.GL_TEXTURE_2D, values[0]);

        // we only want GRAYSCALE / Single channel texture
        GLES30.glTexImage2D(GLES30.GL_TEXTURE_2D, 0, GLES30.GL_R8, texWidth, texHeight, 0, GLES30.GL_RED, GLES30.GL_UNSIGNED_BYTE, null);
        GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_WRAP_S, GLES30.GL_CLAMP_TO_EDGE);
        GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_WRAP_T, GLES30.GL_CLAMP_TO_EDGE);
        GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_MIN_FILTER, GLES30.GL_NEAREST);
        GLES30.glTexParameteri(GLES30.GL_TEXTURE_2D, GLES30.GL_TEXTURE_MAG_FILTER, GLES30.GL_NEAREST);

        this.tex_id[0] = values[0];

        GLES30.glGenFramebuffers(1, values, 0);
        GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, values[0]);

        this.fbo_id[0] = values[0];
        GLES30.glFramebufferTexture2D(GLES30.GL_FRAMEBUFFER, GLES30.GL_COLOR_ATTACHMENT0, GLES30.GL_TEXTURE_2D, this.tex_id[0], 0);

        final int status = GLES30.glCheckFramebufferStatus(GLES30.GL_FRAMEBUFFER);

        if (status != GLES30.GL_FRAMEBUFFER_COMPLETE) {
            Debug.LogError("Framebuffer incomplete. Status: " + status);
        }

        GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, 0);
}

The full render code. I’ve deconstructed as much of the logic and flow as possible for clarity.


        // bind the offscreen FBO and render the current camera frame
        GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, dualFBO.getID());
        camera.draw(ShaderType.GRAYSCALE);

        // ping-pong the FBO ID's
        dualFBO.swap();

        // dualFBO will now return the ID for last frame
        GLES30.glBindFramebuffer(GLES30.GL_FRAMEBUFFER, dualFBO.getID());

        // bind the current PB and submit (meant to be async) glReadPixels
        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, dualPBO.getID());
        GLES30.glReadBuffer(GLES30.GL_COLOR_ATTACHMENT0);

        // this call locks for 30-50ms... why? (meant to be async???)
        JNI.glReadPixels(0, 0, width, height, GL_RED, GL_UNSIGNED_BYTE, 0);

        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);

        // ping-pong the PBO ID's.
        dualPBO.swap();

        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, dualPBO.getID());

        // this call is instant
        JNI.glMapBufferRange(GL_PIXEL_PACK_BUFFER, 0, width * height, GL_MAP_READ_BIT);
    
        GLES30.glUnmapBuffer(GLES30.GL_PIXEL_PACK_BUFFER);
        GLES30.glBindBuffer(GLES30.GL_PIXEL_PACK_BUFFER, 0);

        // the CV stuff, which now has data from the glMapBufferRange
        JNI.processCV();

Any help in this matter would be highly appreciated! I’ve never had to read data back from openGL every frame in real-time, so I’m at wits end here. Below is some more code so you guys can get an idea on how the logic flows.

basil · March 24, 2017, 12:37am

i’m curious since i never ran opengl on such device. … reading pixels to a PBO is very slow on the desktops i’ve tested so far - as long as it is happening on the “rendering thread”. doing so on a different thread is very speedy.

using double-buffered PBO’s is a good idea. what i found is … even with a single PBO, fetching the pixels on a different thread and locking the whole thing with a reentrant lock … is pretty speedy. at least significantly faster then single-thread-pixelread.

this is pointing at least to some driver/hardware special case or something like copy-engines kicking in.

hint : read up

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glFenceSync.xhtml

and

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glWaitSync.xhtml

http://on-demand.gputechconf.com/gtc/2012/presentations/S0356-GTC2012-Texture-Transfers.pdf page 14+

DrHalfway · March 24, 2017, 2:24am

Thank you for the reply basil_

I was hoping someone would reply ;D

What I’m confused about is that glReadPixels from FBO -> PBO is meant to be an async call, but in all of these cases it actually blocks the render thread for a long time.
From my source i’m actually using double FBO’s and each FBO has double PBO’s, but this makes zero difference since glReadPixels still blocks.

And it’s interesting about other threads, I wasn’t aware that GL calls could be made in a non-GL thread on Android. I’ll give that a shot.

basil · March 26, 2017, 7:54pm

exactly, i dont know if it’s possible. … or how good a shared gl context would work on mobile.

anyway. instead of glReadPixels you could try glGetTexImage/glGetTextureImage and deal with the textures instead of color_attachments.

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glGetTexImage.xhtml

https://www.khronos.org/opengl/wiki/Pixel_Buffer_Object#Downloads

DrHalfway · March 27, 2017, 5:28am

Unfortunately the glGetTexImage/glGetTextureImage is not available in the OpenGL ES spec for Android. Not even via JNI.

And from https://www.khronos.org/opengl/wiki/Pixel_Buffer_Object#Downloads in the Downloads section states this…

[quote]This is really where PBOs shine, performance-wise. The savings when using PBOs for downloads are substantial. Again, as long as you have something to do during that time.
[/quote]
That’s great and all, except the transfer is actually blocking on my code, making the whole thing obselete. I might aswell just glReadPixels() on the FBO directly.

basil · March 27, 2017, 10:58am

aah i see. my experience with ES is not … existing really.