Fast RGBA <-> ARGB conversion ?

As the topic says, I’m looking for a fast way to convert from an RGBA array to an ARGB one and vice-versa.

As far as I know this color space conversion is not supported by OpenGL so it can only be made in software, but maybe someone here knows a nice way to make it very fast (i.e. by using MMX, etc).

Suggestions welcome.

you could try (shift and rotate)

argb = rgba <<< 8

rgba = argb <<< 24

[quote]you could try (shift and rotate)

argb = rgba <<< 8

rgba = argb <<< 24
[/quote]
What programming language is this?

[quote]What programming language is this?
[/quote]
Umm… that would be Java. Although it’s odd, because the JLS shows the shift and rotate right operator, but no shift and rotate left. I’m pretty sure the later works, however. Go figure.

In case you’ve never seen this sort of bitwise operation, it basically works like a shift. In a shift, the bits are moved the specified number of positions left or right. However, if the bits at the extreme ends of the number are lost when they overflow the number of bit positions. That’s why C programmers used to use “myvar <<= 32” as a really fast method for setting a variable to zero. The difference with shift and rotate, however, is that the bits get wrapped back around the other end of the variable. This results in being able to reshuffle the exact bit positions without losing any information.

Shift and rotate is closely related to the assembly code concept of shift with carry, where the carry flag/register is set to the last bit value shifted out of the variable.

There is no rotate operator in java. “>>” is signed shift left. “>>>” is unsigned shift left. The difference is wether a the sign bit is kept with negative numbers.

[quote]There is no rotate operator in java. “>>” is signed shift left. “>>>” is unsigned shift left. The difference is wether a the sign bit is kept with negative numbers.
[/quote]
Doh! Seems you’re right. Oddly enough, I did a quick Google search before I posted, and got a result or two that said “<<<” was shift and rotate in Java. Guess it’s time to “forget” one more thing. :slight_smile:

This code should produce a reasonable facsimile of a shift and rotate:


rgba = (argb << 8) | (argb >>> (32-8));
argb = (rgba >>> 8) | (rgba << (32-8));

Special thanks to the Java Glossary for that bit of info.

Edit: Here’s the varmits who should be hanged by their pinky toes! From their page:

[quote]bitwise operators - Operators used on the byte data type to modify bit values. These include the bitwise and (&), or (|), not (!), exclusive or (^), shift left (<<), shift right (>>), and shift and rotate (<<<) operators.
[/quote]

Shifting is not rotating and operators like ROR.L ROL.L (68k asm) are not in the JSL.

An operation like

argb = (rgba >>> 8) | (rgba << (32-8));

of course takes a lot more than a single bitwise rotation (which is a common CPU operation).

Sadly I’m not fluent in x86 asm. Maybe someone here can write a simple loop that makes the conversion.

It would be nice to have it in the LWJGL BufferUtils:

BufferUtils.toARGB(intarray);
BufferUtils.toRGBA(intarray);

these functions would be widely used to create OGL textures etc.

Mik

This conversion is automatically performed by GL drivers. There is no need to convert a texture yourself. The drivers probably have a much faster implementation than Java can manage too.

Cas :slight_smile:

[quote]…automatically performed…
[/quote]
Do you mean that glReadPixels from a Pbuffer can give me ARGB instead of RGBA at hardware speed ? Would be fantastic…

Sorry guys I didn’t mean to send anyone in the wrong direction, I think I was looking at the same page jbanes was.

Bah! 2 shifts and an or shouldn’t strain the CPU in the slightest. In fact, each one should cost no more than 1 cycle a piece. This makes the cost of converting a 256x256 image 256x256x3 = 196,608 cycles. Do that 60 times a second, and you’ve got 117,964,480 cycles per second. That works out to requiring a 118MHz processor. Of course, one would hope that you don’t actually need to do that 60 times a second. :slight_smile:

FWIW, the assembly instructions for shift and rotate are ROL (left) and ROR (right). These come at a cost of 1 to 3 cycles per execution. I don’t have enough documentation in front of me to determine the situation when it takes one or the other.

Well, I’m sure you’re not talking of java bytecode.

Did you measure a java for() loop doing the above conversion of an int array ? I think it should be several times slower that the x86 asm counterpart.

From the cpu clock cycles side, I remember that 68020 and above had “barrel shifters” (1clk for sure, but on 040 you can have some “free instructions” depending on the execution order: 1 clk -> several instructions) and surely modern cpus would make even better. So how can you be so sure about the Mhz you need ?

[quote]Did you measure a java for() loop doing the above conversion of an int array ? I think it should be several times slower that the x86 asm counterpart.
[/quote]
I wouldn’t be so certain about that. The bounds checks would certainly suck, yes, and there is overhead for the for loop. However, if this code executes often enough, HotSpot will take notice and optimize it. You could easily expect that HotSpot would eliminate the bounds checking, unroll the loop a bit, and do a few other optimizations that would make it blaze. With any luck, it should be very close in performance to an assembly version.

[quote]From the cpu clock cycles side, I remember that 68020 and above had “barrel shifters” (1clk for sure, but on 040 you can have some “free instructions” depending on the execution order: 1 clk -> several instructions) and surely modern cpus would make even better. So how can you be so sure about the Mhz you need ?
[/quote]
I think what you’re referring to is Out Of Order instructions and SuperScalar execution. (The two are closely related.) And yes, it would be great if the processor gave better than 1 cycle per instruction. However, it’s always best to plan for worst case rather than best case. :slight_smile:

The Pentium 4 no longer has a barrel shifter :confused:

Cas :slight_smile:

Apart from the P4 implementation (no barrel shifter ? fantastic :-/). I would make you all concentrate on the main topic and go back to the RGBA-ARGB conversion.
Don’t you think that such a feature would be nice inside LWJGL ? This feature could be easily included in the native code and people would use it instead of their implementation. LWJGL implementors could also put a processor-optimized version of the functions. Consider this a kind of RFE.

FWIW, I don’t actually see any reason whatsoever in implementing this in lwjgl… Why would any game have to convert between ARGB and RGBA ? One must assume that all content is in the same order.

For people like me ? :wink:

More seriously:

I develop Java2D desktop video applications at full time. I’ve recently discovered LWJGL and I’m developing a Java2D-like library that will allow me to easily move from Java2D to a faster implementation.
After playing with LWJGL for about five days I already have a working implementation (call it a proof of concept) that can handle AffineTransforms, images, composite and some primitives. More to come. Through the use of Pbuffers I can create several contexts (i.e Graphics2D) and draw my offscreen graphics there.

Of course I must integrate the LWJGL-based rendering with the Java2D counterpart, so I think that a pair of nicely optimized methods like BufferUtils.toARGB(fb) andBufferUtils.toRGBA(fb) would be a “standard” and fast bridge between OpenGL and Java2D pixel formats.
Maybe I’m wrong, but I don’t think I’m the only one that uses LWJGL in this way.

Cheers,

Mik


      public BufferedImage screenShot()
      {
            BufferedImage image = null;
            
            // allocate space for RBGA pixels
            ByteBuffer fb = BufferUtils.createByteBuffer(width * height * 4);
            int[] pixels = new int[width * height];
            int bindex;

            GL11.glReadBuffer(GL11.GL_FRONT);

            // grab a copy of the current frame contents as RGBA
            GL11.glReadPixels(0, 0, width, height, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, fb);

            // convert RGBA data in ByteBuffer to integer array
            for (int i = 0; i < pixels.length; i++)
            {
                  bindex = i * 4;
                  int a = fb.get(bindex + 3) & 0xff;
                  int r = fb.get(bindex + 0) & 0xff;
                  int g = fb.get(bindex + 1) & 0xff;
                  int b = fb.get(bindex + 2) & 0xff;
                  pixels[i] = (a << 24) | (r << 16) | (g << 8) | (b);
            }

            // Create a BufferedImage with the RGBA pixels
            try
            {
                  image = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
                  image.setRGB(0, 0, width, height, pixels, 0, width);
            }
            catch (Exception e)
            {
                  System.out.println("ScreenShot() exception: " + e);
            }
            return image;
      }

would read as:


      public BufferedImage screenShot()
      {
            BufferedImage image = null;
            
            // allocate space for RBGA pixels
            ByteBuffer fb = BufferUtils.createByteBuffer(width * height * 4);
            int[] pixels = new int[width * height];
            int bindex;

            GL11.glReadBuffer(GL11.GL_FRONT);

            // grab a copy of the current frame contents as RGBA
            GL11.glReadPixels(0, 0, width, height, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE, fb);

            // convert
            BufferUtils.toARGB(fb);

            // Create a BufferedImage with the RGBA pixels
            try
            {
                  image = new BufferedImage(width, height, BufferedImage.TYPE_INT_ARGB);
                  image.setRGB(0, 0, width, height, pixels, 0, width);
            }
            catch (Exception e)
            {
                  System.out.println("ScreenShot() exception: " + e);
            }
            return image;
      }

As I say, this is built into all OpenGL drivers. It’s not a function that will ever make it into LWJGL.

Cas :slight_smile:

I’m sorry for my ignorance, but… can you make an example, an how-to that shows how to do RGBA-ARGB with OpenGL ?

Of course don’t assume we’re onscreen. Just think you want to get a BufferedImage from the pixels you read with glReadPixels().

Simply pass in the required format into glReadPixels()… eg. GL_EXT_bgra or GL_EXT_rbga. Make sure the extension is supported first. Nearly all drivers support this conversion.

Cas :slight_smile: