Realtime raytracing experiments in pure Java

I assume it ran into a deadlock. Till today I didn’t know that multiple notify() calls can cause only one wait() to wake up if they happen at exactly the same time (and it happened …)

I once more changed the threading concept. I hope it now works without deadlocks. It starts 8 rendering worker threads and one program thread, and is capped at 60 FPS for those with real powerful machines. I can’t test the capping though, I don’t reach that much FPS with my hardware :wink:

I want to call this the “Turn up your fans, maximum core!” edition ;D
http://www.java-gaming.org/user-generated-content/members/132627/simpleray-r4.jar

3 mirror balls, one infinite plane and a sky sphere, give me ~30 FPS. I’m somewhat certain that I can’t achieve big speedups anymore, but I’ll see what I can do with it. It’s a nice toy for sure :slight_smile:

@Roquen: I’ve got one volatile variable which is increased by the worker threads to count how many workers are done yet. Now that you mention it, I must say that I’m not quite sure what happens with the frame buffer, which is written to by all worker threads.

@alesky: Thanks :smiley:

@ra4king: Wow, that is some gaming equipment you have there! Raytracing is a simple concept, but vector math … well you know how to use OpenGL, so I assume have already learned most it :slight_smile:

A volatile variable is probably not a good idea. Increment is a get and set operation which is not thread safe. How about AtomicInteger?

Given that threading is hard, and prone to issues, how about using some of the concurrency utils built in to the JDK? You could use an ExecutorService to manage the worker threads, and have your render thread wait on each Future.get() in turn.

What does that mean exactly? Are you working directly with an int[] of pixels? Where are you getting them from - a BufferedImage? If so, you could also build an image from your array rather than trying to get the array from the image - this code might help (PixelData is just a wrapper to an int[] and dimensions, etc.)

Maybe I was wrong again. I thought an x++ is an atomic operation in Java. I didn’t look into the new concurrency classes yet (I’ve been working with Java 1.4 tooo long …), but I heard they are useful. AtomicInteger is new to me, but seems to be what I should have used there.

I guess it would be a good idea ;D

BufferedImage -> WriteableRaster -> DataBuffer -> int []

I’ve been surprised that it was actually accessible. But I must admit my last try to dig into the graphics internals was done with Java 1.2, and my memory says that the structures were not so easily accessible back then.

That’s about what I have been doing, just that they build an BufferedImage from the data, while I wanted to write into the buffer of an already existing BufferedImage. But thanks, it’s good to know that the method is used, and that it’s not as hacky as I had assumed :slight_smile:

At least it opened a lot of options for me, since I was/am used to working with frame buffer devices and I can now use a lot of tricks that I was used from my older projects. Writing all pixels in a loop gave me 144 FPS on my PC, using one thread. That seemed to have plenty of reserves for application code, and was much faster than my former tries to display graphics with Java2D, which were more like giving 15-20 FPS. My bigger game projects don’t need it right now, but it’s good to know that once can use Java this way nowadays :slight_smile:

The only question that really puzzles me is why drawing operations including alpha blending are so slow for volatile images on some graphics adapters. On my PC drawing semi-transparent buffered images to a volatile image resulted in like 400 FPS and on my laptop about 20 FPS. This is too much difference to be easily understood by my poor brain :persecutioncomplex:

Thanks for the hints with the new concurrency classes!

No, it isn’t. See eg. http://jeremymanson.blogspot.co.uk/2007/08/volatile-does-not-mean-atomic.html

Nothing (relevant) has changed as far as I can see. You just need to be careful where you’re getting the BufferedImage from and what format it’s in. Any TYPE_INT_* should give you an int[] though, so might not be the cause of Julien’s issue. It’s not hacky either - it’s what BufferedImages were designed for! The code I pointed to is mine btw - feel free to use anything from that class without worrying about the license.

Slight bit of pedantry, but I’d say the title of this thread isn’t entirely accurate - if you’re doing all the raytracing through direct array manipulation it’s not really in Java2D.

The deadlock in r3 was caused by something else. I don’t think someone has given feedback to the r4 yet (the one with the volatile int for the workers done count). But just to be safe, I’ve changed that to an AtomicInteger:

http://www.java-gaming.org/user-generated-content/members/132627/simpleray-r5.jar

There was also a bug in r4 that disabled the 60FPS cap. I still couldn’t test it, but in r5 it’s at least enabled.

Regarding the buffered images, maybe I just didn’t know the right was when I tried it first. Well, at least I know it now and I’m happy that it works :slight_smile:

And yes, I’ve taken care to create a BufferedImage of TYPE_INT_RGB, in the hope that those will always get an IntDataBuffer. Maybe Julien just ran into an immediate deadlock, I dunno. If the r5 works, I’ll blame my buggy threading.

I’ll adjust the thread title.

Shameless plug: http://code.google.com/p/small-java-threading-library/.

I’d say do it like this:

  1. Start X threads when the program starts.
  2. When you want to render, signal the threads to start in some way.
  3. Then each thread gets and increases an AtomicInteger to get a pixel index. Each thread then processes for example 1000 pixels or so, basically something like this in a loop:

int pixelIndex = atomicInteger.getAndIncrement();
//Add code to terminate if done.
int start = pixelIndex * PIXELS_PER_BATCH;
int end = Math.min((pixelIndex + 1) * PIXELS_PER_BATCH, width*height);
for(int i = start; i < end; i++){
    int pixelX = i % screenWidth;
    int pixelY = i / screenHeight;
    processPixel(pixelX, pixelY);
}

The point of using the PIXEL_PER_BATCH variable is to reduce the amount of synchronization so that you don’t have to synchronize per pixel. However, we still want to have a pretty big number of subtasks since each subtask can take a very different amount of time depending on what geometry it hits.

If you were to use my threading library, here’s the code for it:



//Constant:
private static final int PIXELS_PER_BATCH = 1000;


//Setup code:
TaskTreeBuilder builder = new TaskTreeBuilder();
SplitTask task = new SplitTask(0, 0, 1) {
	
	@Override
	protected void runSubtask(int subtask) {
		int start = subtask * PIXELS_PER_BATCH;
		int end = Math.max((subtask + 1) * PIXELS_PER_BATCH, width*height);
		for(int i = start; i < end; i++){
		    int pixelX = i % screenWidth;
		    int pixelY = i / screenHeight;
		    processPixel(pixelX, pixelY);
		}
	}
	
	@Override
	public void finish() {}
};

builder.addTask(task);

TaskTree tree = builder.build();
GameExecutor executor = new MultithreadedExecutor(Runtime.getRuntime().availableProcessors());


//Game loop code:
task.setSubtasks((width * height + PIXELS_PER_BATCH - 1) / PIXELS_PER_BATCH); //Round up division
executor.run(tree);

Hey you could try Aparapi for this. Would give it a try if wouldn’t be that tired atm

When using volatile variables, you have a guarantee that the value you read is from the main memory, instead of from the processor cache. This does not mean you can increment it reliably from more than one thread, as an increment is basically:


volatile int counter;

{
    int n = counter;
    counter = n + 1;
}

You can see how the read and the write are two instructions, so multiple threads can and up interleaving these instructions, causing lost increments.

Just using AtomicInteger doesn’t solve it, as it’s just a volatile integer underneath. You must use [icode]counter.incrementAndGet()[/icode] or [icode]counter.getAndIncrement()[/icode] as they have the required functionality (compare-and-swap) to guarantee increments from multiple threads.

I’ll try to post the relevant code parts to give an idea what I’m doing. The code is from r5.

Tracer setup and thread management:

    public Tracer(DisplayPanel panel)
    {
        this.displayPanel = panel;
        this.objects = new ArrayList<SceneObject>();
        this.workers = new ArrayList<WorkerThread>();
        this.doneCount = new AtomicInteger(0);
        
        int count = 8;
        
        for(int i=0; i<count; i++)
        {
            workers.add(new WorkerThread(this, i));
        }


    private void setup()
    {
        camera = new V3(2, -10, 7);
        lookAt = new V3(0, 1, 0);

[snipsnip]
        
        for(WorkerThread worker : workers)
        {
            worker.start();
        }

Tracer doing thread workload distribution:

    void calculateScene()
    {
        final int height = displayPanel.getHeight();
        final int width = displayPanel.getWidth();
        final int hh = height >> 1;
        
        final int stripe = height / workers.size() + 1;
        
        for(int i=0; i<workers.size(); i++)
        {
            final int yStart = -hh + i * stripe;
            final int yEnd = Math.min(hh, yStart + stripe);
            
            WorkerThread worker = workers.get(i);
            worker.startRendering(yStart, yEnd, width);

            synchronized(worker)
            {
                worker.notify();
            }
        }
    }

Worker callback when done:

    public synchronized void workerDone()
    {
        int count = doneCount.incrementAndGet();
        
        if(count == workers.size()) 
        {
            notify();
        }
    }

Tracer sync’ing on all workers done:


    private synchronized void waitForSceneFinish()
    {
        try
        {
            // System.err.println("waiting for workers on frame=" + frame);
            
            wait();
            
            doneCount.set(0);
        }
        catch (InterruptedException ex)
        {
            Logger.getLogger(Tracer.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

Worker thread core method:

    private synchronized void calculate()
    {
        while(true)
        {
            try
            {
                wait();
                tracer.calculateScene(yStart, yEnd, v, p, lineV, linepix);            
                tracer.workerDone();
            }
            catch (InterruptedException ex)
            {
                Logger.getLogger(WorkerThread.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
    }


Tracer scene building:

    synchronized void nextFrame(Graphics gr)
    {
        calculateScene();
        displayPanel.paint(gr);
        waitForSceneFinish();
    }

Just to say, since posting earlier I’ve given this a go (r5). Good work! I get a steady 30fps with OpenJDK 1.6 on Ubuntu 12.04 - Intel® Core™ i5 CPU M 430 @ 2.27GHz × 4 Top gives me this running at ~300% CPU (100% per virtual core).

Thanks :slight_smile: Good to know that it runs with OpenJDK too.

I meanwhile found out why the volatile integer worked. The workerDone callback is synchronized altogether.


    public synchronized void workerDone()
    {
        int count = doneCount.incrementAndGet();
        
        if(count == workers.size()) 
        {
            notify();
        }
    }

Instead of doneCount.incrementAndGet() the r4 had a doneCount++ there. But since the whole method is synchronized, there was no race condition there, and it was sufficient to have the variable volatile, to make sure all cores have the same value.

Learned something again. I must say in all the years of using Java I didn’t learn as much about concurrency than in this small project. It’s been good to try :slight_smile:

The latest version (-r4) works, I get 11 FPS on my Intel Inside Pentium D. When I maximize the window, it becomes really slow.

I am an expert in vector math and linear algebra in general. My only problem is laziness haha :smiley:

I’ve added shadows. The impact on performance seems to be about 30%, that is harsh, but it still works quite fine and the shadows add a lot to the impression.

The latest demo with shadows added:

http://www.java-gaming.org/user-generated-content/members/132627/simpleray-r6.jar

would you mind puttin up your code?
want to try if i would run well on the GPU(your code)

I haven’t made up my mind yet if I want to publish this as open source.

why not?

Doooo ittttt…I want to learn how in the world you do stuff like this :S

I get 5 FPS… My computer certainly isn’t the most powerful thing out there, that’s for sure.

this is encouraging. i’m gonna try my own idea for realtime raytracing.