Is my 2D rasterizer too slow?

Hi all! I’m implementing my rasterizer in pure java (1.4) and would like to know if it’s performance is way off where it should be for a java software renderer or not.

When I use the 2D draw triangle function, it draws 4000 triangles each frame on my Athlon XP1700 at about 18fps, where each triangle contains 1/2 pixels of a 50x50 pixel square.

Is that too slow and needs to be optimized before I go on any further with an implementation of a software renderer?

Here’s some more stats:
1000 tris./frame = 55fps
2000 tris./frame = 32fps
4000 tris./frame = 18fps

Off the top of my head that sounds slow; ISTR doing 25 fps with 4k tris on a p3-450 ish. But maybe I’m being generous to myself and it was only 400 tris :P.

But…it depends what you’re rendering. Gouraud? Perspective-correct-textures? Both?

Obviously, if you’re rendering speed is that low for just plain-old-flat-shading then you’re in serious trouble :).

PS is this just for academic interest? Because with 1.4 there’s little need for a software rasterizer (if you’re requiring 1.4 then the players will already be able to use OpenGL etc; sure, a few people with buggy OGL cards migth love you ;D, but…).

Thanks for replying.
I’m doing this to simply satisfy my academic intrests :).
You mentioned ISTR, is that a java software renderer? If yes then can you tell me what size were the triangles as this has a huge impact on my 2D resterizer’s draw triangle rutine.

If I use smaller triangles 10x10 instead of 50x50 then I can draw 16,000 triangles / frame @ 30fps.

BTW my triangles are 2D & flat shaded ONLY! … then perhaps it’s still hopless performance.

I didn’t know wtf istr was either.

Sounds like he’s using Graphics.drawPolygon(). I’d say doing your own polygon rasterization would always be faster. But I never stress tested my software renderer before stopping work on it. So… maybe egonolsen will see this thread.

edit;

Also, something I’ve been researching lately is hardware rendering pipelines. I’m thinking copying gpu hardware might lead to greater performance even in java.

Old school TnL - Translation and Lighting hardware required some tricks that might be optimized by the jit really well. ie do all your vertex transforms, normal transforms, and lighting calculations in one shot. Keep state changes to a minimum. Make everything nice and predictable for the jit compiler and cpu.

Then there are gpu’s like the powervr that used some advanced techniques to avoid over draw thus improving fill rate.

For java keep array access at a minumum especially when rasterizing the actual polygons.

After all, when your doing software rendering the cpu is the ultimate programmable gpu.

That is bad, I’m afraid :frowning:

Could definitely get thousands of tris @ decent interactive framerates (20-30fps) on 500Mhz machines…

How about you post your tri-rendering code? I’m not guaranteeing I’ll look at it (tri-rendering code is often butt-ugly and painful to read through!) but I’ll glance it and see how brave I’m feeling…

(nb: there is of course a small chance I’m talking BS; it’s been so long since I did a serious attempt at thousands of tris I may be mis-remembering everything. I’m pretty confident, but not 100% ;D)

You asked for it blahblahblah!
My triangle code is quite simple :slight_smile:

I’m concerned about the code that draws scan lines since this is where the worst performance hit is.

Heres my scan line drawing code …


public void setScanLine(int x1, int x2, int y, int color)
{
  // begining and end of the array's indexes that are to be set
  int start, end;
  start = width*((height-1)-y);
  end = start;
  start += x1;
  end += x2;

  for(int i = start; i <= end; i++)
    colorBuffer[i] = color;
}

And here’s the triangle code …


public void drawTriangle(int x1, int y1, int x2, int y2, int x3, int y3, int color)
{
    // primary and secondary lines
    RasterLine pl, sl;
    // setup a raster triangle
    triangle.set(x1, y1, x2, y2, x3, y3);
    // perform sorting of points wrt y components
    triangle.sortPoinsByY();
    // update lines of the triangle
    triangle.updateLines();

    // now we can begin working out the scan line intersections

    // first get the priamry and secondary lines
    pl = triangle.l1;
    sl = triangle.l2;
    // check if the primary line is horizontal
    // (remember that the primary line has the gratest dy)
    if(pl.getType() == RasterLine.HORIZONTAL ||
        pl.getType() == RasterLine.POINT)
    {
        // then the whole triangle is a horizontal line
        drawScanLine(pl.x1, pl.x2, pl.y1, color);
    }
    else // we have a non-degenerate (normal) triangle
    {
        // now traverse the y component of the longest line wrt y-axis
        for(int y = triangle.l1.y1; y <= triangle.l1.y2; y++)
        {
          // see if the secondary line is horizontal or if it's end was reached
          if(sl.getType() == RasterLine.HORIZONTAL || 
( y == triangle.l2.y2 && triangle.l3.getType() != RasterLine.HORIZONTAL ))
                sl = triangle.l3;

      // get the x intersection from the primary and secondary lines
      drawScanLine(pl.getX(y), sl.getX(y), y, color);

    }
  }
}

Note that I don’t create any new objects in my triangle code.

Ah, you’re using direct render to buffer. That’s probably why it’s so slow.

Check / fix your ColorModel so that it matches the screenmode perfectly. Get this right and you get decent speed; get this wrong and you (historically) get huge performance penalties. e.g. if screen is 5550 and you are on 8888 etc. I would have thought this was “fixed” now in the JVM, but…

Try changing your code so that it just does a “drawline” for the scanline, and see what difference it makes; you said you’re not doing per-pixel effects, so the render should be identical.

How exactly do they have access to GL without other downloads?

Anyway, I’ve never done a straight triangle test on my renderer, but I can fill a 500x500 applet with a mutlitextured scene at about 60-70 fps on my P4 2.4. That jumps to well over 100 fps if I just flat shade it. Those numbers drop slightly if I use purely 1.1. I’m not sure if that helps you or not.

blahblahblahh:
I did a test where I substitute my own draw triangle for what java provides ( fillPolygon() ), and my fps jumped up from 18fps to 28fps when drawing 4000 triangles per frame … thant means I’m not way off with my own method :slight_smile:

I’ll checkout my color mode now …

Absolution:
When you draw your scene with flat shaded triangles only, how many traingle do you have in your scene?

[quote] Try changing your code so that it just does a “drawline” for the scanline, and see what difference it makes; you said you’re not doing per-pixel effects, so the render should be identical.
[/quote]
It’s better to use fillRect for this, it’s faster than drawLine.

fillRect should not be faster than drawLine. They should run at exactly the same speed for horizontal and vertical lines. If not then perhaps that’s a tweak you could pop in to the next release…

Cas :slight_smile:

I think Java2D is already using that optimisation Cas ('cos 1.4 had a bug where only vertical/horizontal lines would render if alpha acceleration was enabled)
drawLine still has to determine if the line is horizontal/vertical before it can fall back back onto the faster fillRect algorithm. Therefor, calling fillRect directly should still be quicker O_o

Pff, yeah, by a cycle or two :slight_smile:

Cas :slight_smile:

Man I miss the jacc; we should organize an applet rendering challenge or something…

Is the performance too slow… for what?

As long as your scenes render > 20fps on the target spec, then its probably fast enough :slight_smile:

[quote]Is the performance too slow… for what?

As long as your scenes render > 20fps on the target spec, then its probably fast enough :slight_smile:
[/quote]
Good point!
Perhaps I’m waisting my time optimizing when I don’t even have a scene yet.

[quote] fillRect should not be faster than drawLine. They should run at exactly the same speed for horizontal and vertical lines. If not then perhaps that’s a tweak you could pop in to the next release…
[/quote]
The difference is at least two checks.

If Hotspot’s doing it’s job correctly it should inline the call and then it should completely eliminate the checks if the values are the same or constant shouldn’t it…?

Cas :stuck_out_tongue:

FWLIW there was no noticeable difference on 1.2.x when I was profiling over the course of one hundred frames or so, the last time I bothered to measure the difference. Maybe the difference is simply so small that it disappears on such a short test? I don’t think I ever bothered micro-benchmarking.

IIRC it also optimized repeated drawlines with a single pixel into a line (ISTR reading about this some years later - the fact that the Graphics does coalescing and/or re-ordering of some calls in order to do batch processing and avoid unnecessarily invalidating the pipeline?) - but it was a long time ago, so maybe my memory’s rusty :).

That would be a very silly optimisation indeed.

Cas :slight_smile: