Does it make sense to store pixel in 3 separate channels during rendering?

After spending quite a few hours on optimizing a software renderer, I realized if you have to do multiple color manipulation on a pixel, it’s might be just faster to represent it in 3 separate channel instead of of rrggbb format throughout the rendering process.

Imagine we have the following tasks:

  1. change the brightness level of a pixel,
  2. then alpha blend it with another pixel,
  3. then average it with neighboring pixels.

Each of the task requires us to
A. extract the r, g, b channel from the pixel,
B. apply changes to each channel and
C. convert it back to rrggbb format.

As a result the total amount of work is 3 x A + 3 x B + 3 x C. However if pixels are represented in separate r g b channels, we won’t need step A and C any more. That reduces the total work to 3 x B + C (we need a single step C at the end since only pixel in the format rrggbb can be written to the screen buffer)

That’s just my Theory…

  1. Here we recommend OpenGL which handles most of this anyway on hardware.
  2. The amount of performance increase would be pathetic. Your time would be better spent optimising something else.
  3. It makes code messy.
  4. If it were a good idea, it would already be done.

Simply, if you are at the stage where you need to optimise colour operations, you need to be using OpenGL.

Bandwith is the king not the ALU operations.

Also, note that in my previous post I mean that IF it actually does give a performance increase, then those are the problems.

It may make performance worse, if anything.

I totally agree with what you have said, except I believe the performance gain could be huge in some scenarios. In my swimming pool project, there is a place where the camera is able to look at the bottom of the pool through a foggy window. To render each pixel on the bottom of the pool, I have to literally unpack and pack the color data 4 times. That definitely a lot of overhead there :-\

This…very much this.

Besides if you have separate channels, then you have to fold them at the end. Depending on what blending operations you’re performing you don’t have to completely separate the values. So given AARRGGBB can be masked into AA00GG00 & 00RR00BB. There’s should be plenty of resources on doing this.

After re-examine the code of my color math, I realized that it won’t increase the performance by using 3 separate channels. The overhead introduced by more array read will probably overshadow the benefit gain from having less fold/unfold operations on the color channels.

[quote]Bandwith is the king not the ALU operations.
[/quote]
Do you mean i should try to include more parallelism in the rendering pipeline instead of optimizing ALU operations?

I tried both methods when I was experimenting with animating a perlin-noise based graphic. For the limited amount of graphics I was doing, the three-array case timed minutely better, but so much less of a factor than the time spent in the perlin function that I just left things in the integer rrggbb form. It didn’t seem worth the trouble to go back to managing the multiple arrays.