AMD Open Sources Aparapi!

nsigma · September 16, 2011, 8:43pm

I wasn’t suggesting that you specifically had to work on it!

As for JOCL solving some of the performance problems, aren’t you missing the point?! Maybe you should start coding in C or assembler because it solves some of the performance problems of Java?

I wonder if a JOCL backing to the compiler would be possible though?

gouessej · September 16, 2011, 8:55pm

Like princec, I really wonder if people really reads the FAQ:
http://code.google.com/p/aparapi/wiki/FrequentlyAskedQuestions#Why_does_Aparapi_seems_to_be_copying_data_unnecessarily_back_and
JOCL works with NVIDIA and ATI graphics cards. I have seen a lot of projects duplicating objects in memory because they do not use NIO buffers, I’m not a newbie. For example, it is the case of the Java version of VSG OpenInventor. I remind you that I have a strong experience in C, I wrote a simulator of heavy processes, I accepted some missions at the beginning of my career in C. Maybe you were joking but I did not find it fun.

nsigma · September 16, 2011, 9:07pm

Huh? It was me that made that point to princec (in jest I hasten to add!)

Yes, I was joking, but you still seem to be misunderstanding what the point of Aparapi is - to run Java code on the GPU using OpenCL. It converts Java bytecode to OpenCL at runtime. You swap some performance for convenience, hence my comment about assembler over Java.

kappa · September 16, 2011, 9:13pm

Yup agreed, OpenCL and even OpenGL shaders are a pain for java developers to pick up, no matter how nice the connecting api to C is, something like the above project is a pretty welcome idea/attempt IMO.

The objection that you shouldn’t support the project because it is AMD only or slow in some way are invalid atm, its still very early days for the project and they’ve made it clear they want to eventually support all OpenCL hardware (and even asked for help regarding this). Its pretty rare for a big vendor like AMD to spend time and resources to care for the small java gaming/graphics/computing community, its even open source under the really nice BSD license.

princec · September 16, 2011, 10:33pm

Well said.

The FAQ’s a bit of a big 'un though, had a bit of a tl;dr moment with it to be honest. But I was wondering why it only worked with ATI if it was purportedly built on OpenCL, which is supposedly supported by both ATI/AMD and Nvidia. (What about Intel? I suppose that’s asking a bit much seeing as they can’t even get their drivers to switch to bloody fullscreen without crashing)

Cas

gouessej · September 17, 2011, 9:47am

I’m ok with the principle, not with its implementation. AMD could have used JavaAssist or something like that to manipulate the bytecode and any Java binding of OpenCL instead of writing some more JNI code (which complicated the interoperability with Java bindings for the OpenGL API).

gfrost · October 1, 2011, 7:21pm

Aparapi will probably work fine (with a single one line hack) with any OpenCL 1.1 compliant runtime. I am the Aparapi team lead and architect and we deliberately culled our test metrics by testing only on AMD OpenCL runtime supported devices. We also built the dev scripts with AMD APP SDK in mind.

Don’t tell my employer but I have run it with NVidia’s runtime, I have not tried Intel’s or Apple’s. Initially we had issues with the NVidia runtime because support for some features we wanted required OpenCL 1.1 compliance and we could not get the 1.1 SDK without registering with NVidia (something I was not too-motivated to do ;)). Now, they are 1.1 compliant and I suspect that the line of code which limits looking for AMD’s OpenCL runtime can (and should be removed).

Take a look at line 417 in aparapi.cpp (I am a noobie on here and want to avoid posting links).

Of course the build.properties for the JNI code will need to be tweaked to point to NVidia’s include/lib files if someone needs to rebuild the code. My guess is someone with NVidia’s SDK could build an NVidia compatible version in 20 minutes. Then I would recommend raising an issue and posting a patch (probably 3 lines of code ). I will happily commit it.

Someone asked later why we did not use JavaAssist (or bcel) to do the bytecode analysis to create OpenCL. In truth we could have (I am huge JavaAssist fan and presented a session at JavaOne a few years back using it) but we wanted 0 dependencies on external libraries to simplify the open source process.

I have exchanged emails with Micheal Bien and Marco Hutter (the two JOCL guys) and would like to look at using JOCL for adding library extensions to Aparapi. I still prefer the idea of not forcing Java developers to have to learn OpenCL. However, for those that are prepared to do this, I think that Aparapi + JOCL would prove a nice mix.

I was pleased to discover this discussion and join this site.

Gary

CommanderKeith · October 4, 2011, 3:24pm

Hi Gary, that’s great news. You’re amazingly open with the whole project. Very impressive work.

I’m still learning openGL/openCL so this is quite over my head but this project is particularly interesting for java heads like us.

This is quite unrelated, but may I ask, why do AMD chips have generally poorer openGL drivers compared to NVidea chips? If all the dev’s at AMD are anything like you then I would have thought that the AMD graphics chips would have rolls-royce openGL drivers.

Keith

gfrost · October 13, 2011, 1:37am

Aparapi should now work with any OpenCL 1.1 compliant runtime.

Witold Bolt kindly supplied a patch to support Mac OS and as part of that patch I removed the AMD only restriction.

Now Aparapi will iterate through available 1.1 platforms and locate the first GPU device.

I must confess non-AMD device testing is less than thorough but the chains are off.

Give it a whirl.

Gary

zammbi · October 13, 2011, 6:50am

[quote]Now Aparapi will iterate through available 1.1 platforms and locate the first GPU device.
[/quote]
Shouldn’t it locate the best GPU device?

Does it support Intel inbuilt graphics?

theagentd · October 13, 2011, 9:26am

Wha…? Sure, by 2020, but the driver support is going to be so buggy that it’ll crash on 99% of all OpenCL programs and fallback to software for the last 1%. >_>

gouessej · October 14, 2011, 1:54pm

It is a good piece of news. When I suggested to use JOCL, it was mainly to allow interoperability with JOGL, not to force programmers to learn OpenCL. Best regards.

xinaesthetic · October 15, 2011, 9:38am

I’m just making a little hello world (won’t probably have a chance to do much more than that in the next couple of weeks).

I notice that Kernel has various scalar maths methods implemented… I wonder how non-scalar OpenCL primitives and operations could be used. It seems performance will be compromised for things like graphics and physics if these are inaccessible, am I right?

tom · October 17, 2011, 1:33pm

Fantastic! It now works on my NVIDIA GeForce 9600 GT after updating to the latest drivers.

I’ve tested it with some box intersections code that I had ported to OpenCL to test performance. Here are some benchmarking results:
Original java code: 21206 ms
OpenCL (using LWJGL): 1059 ms
Aparapi GPU: 1592 ms
aparapi SEQ: 12482
aparapi JTP: 6628 ms (2 cores)

The benchmark test 100000 segments against 1000 boxes.

The code uses 6 dot products to transform segment endpoints into box local space. In OpenCL these are buildt in, in aparapi I had to unroll them. This might explain the difference between OpenCL and aparapi.

javazoid · December 1, 2011, 2:27pm

If there a way to write a simple 3x3 or 5x5 convolution with Aparapi ?

gouessej · December 1, 2011, 3:17pm

Have you tested with the latest version of JOCL?

gfrost · December 1, 2011, 4:34pm

I uploaded a ‘Conway Game Of Life’ demo a few weeks back. It is essentially a 3x3 convolution + some algorithmic processing in the Kernel. The Aparapi code for this sample should provide the basis for a good solution.

The sample also shows a few performance tricks for applying multiple generations of a convolution that are probably less relevant to a simple convolution, but are useful for other algorithms.

I have a straight convolution example somewhere, maybe I will add it to the list.

Gary

javazoid · December 1, 2011, 4:41pm

Why don’t you post the code here ?

gfrost · December 1, 2011, 7:56pm

Here you go.

/**

An example Aparapi application which demonstrates Conways ‘Game Of Life’.
Original code from Witold Bolt’s site https://github.com/houp/aparapi/tree/master/samples/gameoflife.
Converted to use int buffer and some performance tweaks by Gary Frost
@author Wiltold Bolt
@author Gary Frost
*/
public class Main{

/**
- LifeKernel represents the data parallel algorithm describing by Conway’s game of life.
- http://en.wikipedia.org/wiki/Conway’s_Game_of_Life
- We examine the state of each pixel and its 8 neighbors and apply the following rules.
- if pixel is dead (off) and number of neighbors == 3 {
- ```
  pixel is turned on
```
- } else if pixel is alive (on) and number of neighbors is neither 2 or 3
- ```
  pixel is turned off
```
- }
- We use an image buffer which is 2widthheight the size of screen and we use fromBase and toBase to track which half of the buffer is being mutated for each pass. We basically
- copy from getGlobalId()+fromBase to getGlobalId()+toBase;
- Prior to each pass the values of fromBase and toBase are swapped.
*/

public static class LifeKernel extends Kernel{

private static final int ALIVE = 0xffffff;

private static final int DEAD = 0;

private final int[] imageData;

private final int width;

private final int height;

private int fromBase;

private int toBase;

public LifeKernel(int _width, int _height, BufferedImage _image) {
imageData = ((DataBufferInt) _image.getRaster().getDataBuffer()).getData();
width = _width;
height = _height;
fromBase = height * width;
toBase = 0;
setExplicit(true); // This gives us a performance boost
```
  /** draw a line across the image **/
  for (int i = width * (height / 2) + width / 10; i < width * (height / 2 + 1) - width / 10; i++) {
     imageData[i] = LifeKernel.ALIVE;
  }
  
  put(imageData); // Because we are using explicit buffer management we must put the imageData array
```
}

@Override public void run() {
int gid = getGlobalId();
int to = gid + toBase;
int from = gid + fromBase;
int x = gid % width;
int y = gid / width;
```
  if ((x == 0 || x == width - 1 || y == 0 || y == height - 1)) {
     // This pixel is on the border of the view, just keep existing value
     imageData[to] = imageData[from];
  } else {
     // Count the number of neighbors.  We use (value&1x) to turn pixel value into either 0 or 1
     int neighbors = (imageData[from - 1] & 1) + // EAST
           (imageData[from + 1] & 1) + // WEST
           (imageData[from - width - 1] & 1) + // NORTHEAST                 
           (imageData[from - width] & 1) + // NORTH
           (imageData[from - width + 1] & 1) + // NORTHWEST
           (imageData[from + width - 1] & 1) + // SOUTHEAST
           (imageData[from + width] & 1) + // SOUTH
           (imageData[from + width + 1] & 1); // SOUTHWEST

     // The game of life logic
     if (neighbors == 3 || (neighbors == 2 && imageData[from] == ALIVE)) {
        imageData[to] = ALIVE;
     } else {
        imageData[to] = DEAD;
     }

  }
```
}

public void nextGeneration() {
// swap fromBase and toBase
int swap = fromBase;
fromBase = toBase;
toBase = swap;
```
  execute(width * height);
```
}

}

public static void main(String[] _args) {

JFrame frame = new JFrame(“Game of Life”);
final int width = Integer.getInteger(“width”, 1024 + 512);

final int height = Integer.getInteger(“height”, 768);

// Buffer is twice the size as the screen. We will alternate between mutating data from top to bottom
// and bottom to top in alternate generation passses. The LifeKernel will track which pass is which
final BufferedImage image = new BufferedImage(width, height * 2, BufferedImage.TYPE_INT_RGB);

final LifeKernel lifeKernel = new LifeKernel(width, height, image);

// Create a component for viewing the offsecreen image
@SuppressWarnings(“serial”) JComponent viewer = new JComponent(){
@Override public void paintComponent(Graphics g) {
if (lifeKernel.isExplicit()) {
lifeKernel.get(lifeKernel.imageData); // We only pull the imageData when we intend to use it.
}
// We copy one half of the offscreen buffer to the viewer, we copy the half that we just mutated.
if (lifeKernel.fromBase == 0) {
g.drawImage(image, 0, 0, width, height, 0, 0, width, height, this);
} else {
g.drawImage(image, 0, 0, width, height, 0, height, width, 2 * height, this);
}
}
};

// Set the default size and add to the frames content pane
viewer.setPreferredSize(new Dimension(width, height));
frame.getContentPane().add(viewer);

// Swing housekeeping
frame.pack();
frame.setVisible(true);
frame.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);

long start = System.currentTimeMillis();
long generations = 0;
while (true) {
lifeKernel.nextGeneration(); // Work is performed here
viewer.repaint(); // Request a repaint of the viewer (causes paintComponent(Graphics) to be called later not synchronous
generations++;
long now = System.currentTimeMillis();
if (now - start > 1000) {
frame.setTitle(lifeKernel.getExecutionMode() + " generations per second: " + (generations * 1000.0) / (now - start));
start = now;
generations = 0;
}
}

}
}

xinaesthetic · December 2, 2011, 10:26am

I thought this may as well have some code tags:

gfrost:

Here you go.


/**
 * An example Aparapi application which demonstrates Conways 'Game Of Life'.
 * 
 * Original code from Witold Bolt's site https://github.com/houp/aparapi/tree/master/samples/gameoflife.
 * 
 * Converted to use int buffer and some performance tweaks by Gary Frost
 * 
 * @author Wiltold Bolt
 * @author Gary Frost
 */
public class Main{

   /**
    * LifeKernel represents the data parallel algorithm describing by Conway's game of life.
    * 
    * http://en.wikipedia.org/wiki/Conway's_Game_of_Life
    * 
    * We examine the state of each pixel and its 8 neighbors and apply the following rules. 
    * 
    * if pixel is dead (off) and number of neighbors == 3 {
    *       pixel is turned on
    * } else if pixel is alive (on) and number of neighbors is neither 2 or 3
    *       pixel is turned off
    * }
    * 
    * We use an image buffer which is 2*width*height the size of screen and we use fromBase and toBase to track which half of the buffer is being mutated for each pass. We basically 
    * copy from getGlobalId()+fromBase to getGlobalId()+toBase;
    * 
    * 
    * Prior to each pass the values of fromBase and toBase are swapped.
    *
    */

   public static class LifeKernel extends Kernel{

      private static final int ALIVE = 0xffffff;

      private static final int DEAD = 0;

      private final int[] imageData;

      private final int width;

      private final int height;

      private int fromBase;

      private int toBase;

      public LifeKernel(int _width, int _height, BufferedImage _image) {
         imageData = ((DataBufferInt) _image.getRaster().getDataBuffer()).getData();
         width = _width;
         height = _height;
         fromBase = height * width;
         toBase = 0;
         setExplicit(true); // This gives us a performance boost
         
         /** draw a line across the image **/
         for (int i = width * (height / 2) + width / 10; i < width * (height / 2 + 1) - width / 10; i++) {
            imageData[i] = LifeKernel.ALIVE;
         }
         
         put(imageData); // Because we are using explicit buffer management we must put the imageData array

      }

      @Override public void run() {
         int gid = getGlobalId();
         int to = gid + toBase;
         int from = gid + fromBase;
         int x = gid % width;
         int y = gid / width;

         if ((x == 0 || x == width - 1 || y == 0 || y == height - 1)) {
            // This pixel is on the border of the view, just keep existing value
            imageData[to] = imageData[from];
         } else {
            // Count the number of neighbors.  We use (value&1x) to turn pixel value into either 0 or 1
            int neighbors = (imageData[from - 1] & 1) + // EAST
                  (imageData[from + 1] & 1) + // WEST
                  (imageData[from - width - 1] & 1) + // NORTHEAST                 
                  (imageData[from - width] & 1) + // NORTH
                  (imageData[from - width + 1] & 1) + // NORTHWEST
                  (imageData[from + width - 1] & 1) + // SOUTHEAST
                  (imageData[from + width] & 1) + // SOUTH
                  (imageData[from + width + 1] & 1); // SOUTHWEST

            // The game of life logic
            if (neighbors == 3 || (neighbors == 2 && imageData[from] == ALIVE)) {
               imageData[to] = ALIVE;
            } else {
               imageData[to] = DEAD;
            }

         }

      }

      public void nextGeneration() {
         // swap fromBase and toBase
         int swap = fromBase;
         fromBase = toBase;
         toBase = swap;

         execute(width * height);
      }

   }

   public static void main(String[] _args) {

      JFrame frame = new JFrame("Game of Life");
      final int width = Integer.getInteger("width", 1024 + 512);

      final int height = Integer.getInteger("height", 768);

      // Buffer is twice the size as the screen.  We will alternate between mutating data from top to bottom
      // and bottom to top in alternate generation passses. The LifeKernel will track which pass is which
      final BufferedImage image = new BufferedImage(width, height * 2, BufferedImage.TYPE_INT_RGB);

      final LifeKernel lifeKernel = new LifeKernel(width, height, image);

      // Create a component for viewing the offsecreen image
      @SuppressWarnings("serial") JComponent viewer = new JComponent(){
         @Override public void paintComponent(Graphics g) {
            if (lifeKernel.isExplicit()) {
               lifeKernel.get(lifeKernel.imageData); // We only pull the imageData when we intend to use it.
            }
            // We copy one half of the offscreen buffer to the viewer, we copy the half that we just mutated.
            if (lifeKernel.fromBase == 0) {
               g.drawImage(image, 0, 0, width, height, 0, 0, width, height, this);
            } else {
               g.drawImage(image, 0, 0, width, height, 0, height, width, 2 * height, this);
            }
         }
      };

      // Set the default size and add to the frames content pane
      viewer.setPreferredSize(new Dimension(width, height));
      frame.getContentPane().add(viewer);
      
      // Swing housekeeping
      frame.pack();
      frame.setVisible(true);
      frame.setDefaultCloseOperation(WindowConstants.EXIT_ON_CLOSE);

      long start = System.currentTimeMillis();
      long generations = 0;
      while (true) {
         lifeKernel.nextGeneration();  // Work is performed here
         viewer.repaint();             // Request a repaint of the viewer (causes paintComponent(Graphics) to be called later not synchronous
         generations++;
         long now = System.currentTimeMillis();
         if (now - start > 1000) {
            frame.setTitle(lifeKernel.getExecutionMode() + " generations per second: " + (generations * 1000.0) / (now - start));
            start = now;
            generations = 0;
         }
      }

   }
}