RGB vs RGBA texture upload

I recently benchmarked texture uploading-speed of OpenGL.

Ofcourse I didn’t mipmap, as that’s done on the cpu.

For a 1024x1024 texture:
as LUMI: 3.4ms
as RGB: 220.8ms :o
as RGBA: 12.2ms
as DXT3: 5.6ms
as DXT5: 6.3ms
Uploading as: GL_UNSIGNED_BYTE

I can understand that uploading RGB could be somewhat slower, because the gpu might have to do some conversions, but ~20x slower… doesn’t seem to make much sense.

Update:
Benchmark is just sending an ‘empty’ direct ByteBuffer of the appropriate length to the gpu.

Over how many runs are these results averaged? Was there much variance in the runs, or were all results for a particular test pretty much the same?

16 runs, same performance per run

How interesting! Could you post your benchmark code so we can try it on a variety of hardware, please?

import java.nio.IntBuffer;
import java.nio.ByteBuffer;
import org.lwjgl.BufferUtils;
import org.lwjgl.opengl.Display;
import org.lwjgl.opengl.GL11;
import org.lwjgl.opengl.PixelFormat;

public class TextureUploadBenchmark
{
   public static void main(String[] args) throws Exception
   {
      Display.setTitle("Texture Upload Benchmark");
      Display.setFullscreen(false);

      Display.create();

      GL11.glClearColor(0, 0, 1, 1);
      GL11.glClear(GL11.GL_COLOR_BUFFER_BIT);

      Display.update();

      GL11.glEnable(GL11.GL_TEXTURE_2D);

      int dim = 1024;
      int count = 16;

      System.out.println("Texture-Upload Benchmark");
      System.out.println();
      System.out.println("   Texture dimension: " + dim);
      System.out.println("   Number of textures per run: " + count);
      System.out.println();
      System.out.println("   Results:");
      System.out.println("    - LUMI: " + benchmarkUpload(dim, GL11.GL_LUMINANCE, count) + "ms / texture");
      System.out.println("    - RGB:  " + benchmarkUpload(dim, GL11.GL_RGB, count) + "ms / texture");
      System.out.println("    - RGBA: " + benchmarkUpload(dim, GL11.GL_RGBA, count) + "ms / texture");
      System.out.println();
      System.out.println("Done.");

      Display.destroy();
   }



   private static final float benchmarkUpload(int dim, int format, int count)
   {
      int bytesPerPixel = -1;

      switch (format)
      {
         case GL11.GL_LUMINANCE:
            bytesPerPixel = 1;
            break;
         case GL11.GL_RGB:
            bytesPerPixel = 3;
            break;
         case GL11.GL_RGBA:
            bytesPerPixel = 4;
            break;
      }

      ByteBuffer buf = BufferUtils.createByteBuffer(dim * dim * bytesPerPixel);
      IntBuffer idBuf = BufferUtils.createIntBuffer(count);
      GL11.glGenTextures(idBuf);

      long t0 = System.currentTimeMillis();
      for (int i = 0; i < count; i++)
      {
         GL11.glBindTexture(GL11.GL_TEXTURE_2D, idBuf.get(i));

         GL11.glTexImage2D(GL11.GL_TEXTURE_2D, 0, format, dim, dim, 0, format, GL11.GL_UNSIGNED_BYTE, buf);
         GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_MIN_FILTER, GL11.GL_LINEAR);
         GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_MAG_FILTER, GL11.GL_LINEAR);
         GL11.glTexParameterf(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_WRAP_S, GL11.GL_REPEAT);
         GL11.glTexParameterf(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_WRAP_T, GL11.GL_REPEAT);
      }
      long t1 = System.currentTimeMillis();

      GL11.glDeleteTextures(idBuf);

      float took = (t1 - t0) / (float) count;

      // format to 1 decimal
      return ((int) (took * 10)) / 10.0F;
   }
}

Output for:
ATi Radeon 9700 PRO

Texture-Upload Benchmark

   Texture dimension: 1024
   Number of textures per run: 16

   Results:
    - LUMI: 3.8ms / texture
    - RGB:  229.5ms / texture
    - RGBA: 15.6ms / texture

Done.

   Texture dimension: 1024
   Number of textures per run: 16

   Results:
    - LUMI: 4.8ms / texture
    - RGB:  17.5ms / texture
    - RGBA: 17.5ms / texture

Done.

GeForce 6600GT


   Texture dimension: 1024
   Number of textures per run: 16

   Results:
    - LUMI: 4.9ms / texture
    - RGB:  15.6ms / texture
    - RGBA: 17.5ms / texture

Done.

-----------------------

   Texture dimension: 2048
   Number of textures per run: 16

   Results:
    - LUMI: 17.5ms / texture
    - RGB:  46.8ms / texture
    - RGBA: 77.1ms / texture

Done.

GeFX 5900, Forceware 76.91

Thanks.

Anyone with ATi card please?

Radeon 8500 (needed to remove multisampling to get it run):

Texture-Upload Benchmark

Texture dimension: 1024
Number of textures per run: 16

Results:
- LUMI: 10.7ms / texture
- RGB: 344.7ms / texture
- RGBA: 29.3ms / texture

Done.

ATI, we have a problem…

Texture-Upload Benchmark

   Texture dimension: 1024
   Number of textures per run: 16

   Results:
    - LUMI: 12.6ms / texture
    - RGB:  29.3ms / texture
    - RGBA: 30.3ms / texture

Done.

GF FX 5200.

Radeon 9500 Pro

- LUMI: 9.7ms / texture
- RGB:  225.5ms / texture
- RGBA: 29.3ms / texture

So now we know it’s ATi’s issue, what’s the next step? :-/

Texture dimension: 1024
Number of textures per run: 16

Results:
- LUMI: 16.8ms / texture
- RGB: 65.0ms / texture
- RGBA: 75.6ms / texture

gf2mx (classic/400) 30.82 deto.

what’s the next step?

a) ignore it
b) say “ati sucks” and never buy an ati card again
c) contact em… if they don’t do something about it fallback to a) or b)

[quote]So now we know it’s ATi’s issue, what’s the next step? :-/
[/quote]
Pointing, and laughing. ;D

ATI X800XT-PE, Cat5.4:

Texture dimension: 1024

Results:

- LUMI: 2.0ms / texture
- RGB:  7.8ms / texture
- RGBA: 9.7ms / texture

ATI Mobility 9200 (in notebook)

[quote] Results:
- LUMI: 7.8ms / texture
- RGB: 140.6ms / texture
- RGBA: 13.6ms / texture
[/quote]

Intel 915 integrated graphics:

Results:

- LUMI: 0.9ms / texture
- RGB:  9.7ms / texture
- RGBA: 11.6ms / texture

lol, scary that the Intel chipset is that fast ::slight_smile:

Maybe it uses system-ram, that would explain it.

Or it’s running software-renderer :smiley:

[quote]Maybe it uses system-ram, that would explain it.

Or it’s running software-renderer :smiley:
[/quote]
Of course it uses system ram…and no, it doesn’t do software rendering. LWJGL stuff works just fine on the machine…just not very fast…
:wink: