Vector math slow down

I have been implementing a raytracing engine for the space sim community project but have noticed, what seemed to me, an unwarranted significant performance issue.

My computer is an Athlon XP 2800
I am able to get the engine to iterate over the pixels of a 640x480 display at over 250fps.

When i add in a array look up it slows down to 190fps

when i add in some quite simple vector math to set up a ray for tracing it slows down to 15fps which i believe is excessive.

I believe i have isolated the problem somewhat to the following (which is in a code fragment listed later in the post)

 tempPixel.position.copy(tempVec3D_3); 

By commenting out the above statement my frame rate doubles! This seems quite odd as tempPixel.position is a Vec3D object and the copy method (listed further in the post) simply copies the contents of the given vector!

Can anyone give me some pointers as to why that task is so processor intensive?

This is the code in question. Note that the actual ray tracing call has been commented out.

tile.height and tile.width are 480 and 640 respectively.

pixels array is an array of Pixel objects which have attributes useful for ray tracing, i.e. position, intersection point, display co-ordinates etc.

tempVec3D_3 is an instance of the Vec3D class listed further in the post.



			// iterate over the tile's rows
			for (intTemp1=0;intTemp1<tile.height;++intTemp1)
			{
				// iterate over the tile's columns
				
				for (intTemp2=0;intTemp2<tile.width;++intTemp2)
				{
					// obtain the pixel at the current row and column 
					tempPixel=camera.pixels[intTemp3++]; // with this alone i get ~190 fps
					

						// calculate the pixel's 3D coordinates
						tempVec3D_3.copy(tempVec);
						tempVec3D_3.scaleAdd(camera.stepUp,intTemp1);
						tempVec3D_3.scaleAdd(camera.stepRight,intTemp2);
						tempPixel.position.copy(tempVec3D_3);
						
						// calculate the primary ray's direction vector
						tempVec3D_3.subtract(camera.position);
						tempVec3D_3.unit();
						
						// trace the primary ray
						//world.primaryRayTrace(primaryRay,tempPixel,camera);
				}
				++intTemp3;
			}

i have the following class which represents a 3d vector and contains the vector operations for it.


package org.moogiesoft.JRTRT;

/**
 * Three dimensional Vector Class and utilites 
 * @author nklaebe
 *
 */
final public class Vec3D {
	
	/** The x, y and z components of the vector */
	public double x,y,z;
	
	/**
	 * Copy constructor. returns a 3D vector initalised to the same value as the passed in 3D vector
	 * @param other The vector to copy
	 */
	public Vec3D (final Vec3D other)
	{
		x=other.x;
		y=other.y;
		z=other.z;
	}
	
	/**
	 * Constructor

	 * 
 Vector components initalised to 0 
	 * 
	 *
	 */
	public Vec3D ()
	{
		
	}
	
	/**
	 * Constructor

	 * 
 Vector components initialised with the passed in values
	 * @param x value to initialise the x component with
	 * @param y value to initialise the y component with
	 * @param z value to initialise the z component with
	 */
	public Vec3D (final double x,final double y,final double z)
	{
		this.x=x;
		this.y=y;
		this.z=z;
	}
	
	/**
	 * re-initalises the x, y and z components to 0
	 *
	 */
	final public void zeroise()
	{
		x=y=z=0.0;
	}
	
	final public String toString()
	{
		return "x= "+x+" y= "+y+" z= "+z;
	}

	/**
	 * Scales the vector by the given multiplier
	 * @param scaleMuliplier the value to scale the vector by
	 */
	final public void scale(final double scaleMuliplier)
	{
        x=x * scaleMuliplier;
        y=y * scaleMuliplier; 
        z=z * scaleMuliplier;
	}

	/**
	 * Adds the given vector scaleled by the given multipler
	 * @param other the vector to add
	 * @param scale the value to mutliply the other vector by
	 */
	final public void scaleAdd(final Vec3D other,final double scale) 
	{
		x+=scale*other.x;
		y+=scale*other.y;
		z+=scale*other.z;
	}
	
	/**
	 * returns the dot product of this vector and the given other vector
	 * @param other the other vector
	 * @return the dot product of this vector and the given other vector
	 */
	final public double dot(final Vec3D other )
	{
		return x*other.x+
		   y*other.y+
		   z*other.z;
	}
	
	/**
	 * Assigns this vector to one vector minus another vector
	 * @param vec the vector to assign this vector to
	 * @param other the vector to subtact from this vector
	 */
	final public void copySubtract(final Vec3D vec, final Vec3D other )
	{
		x=vec.x-other.x;
		y=vec.y-other.y;
		z=vec.z-other.z;
	}
	
	/**
	 * Subtracts the given vector from this vector
	 * @param other the vector to subtact from this vector
	 */
	final public void subtract(final Vec3D other )
	{
		x-=other.x;
		y-=other.y;
		z-=other.z;
	}
	
	/**
	 * Assigns this vector to one vector plus another vector
	 * @param vec the vector to assign this vector to
	 * @param other the vector to add to this vector
	 */
	final public void copyAdd(final Vec3D vec, final Vec3D other )
	{
		x=vec.x+other.x;
		y=vec.y+other.y;
		z=vec.z+other.z;
	}
	
	/**
	 * Adds the given vector to this vector
	 * @param other the vector to add to this vector
	 */
	final public void add(final Vec3D other )
	{
		x+=other.x;
		y+=other.y;
		z+=other.z;
	}
	
	/**
	 * converts this vector into a unit vector
	 *
	 */
	final public void unit()
	{
		final double val= 1.0/ Math.sqrt(x*x+y*y+z*z);
		x*=val;
        y*=val;
        z*=val;

	}
	
	/**
	 * assigns this vector's components to that of the passed vector
	 * @param other the vector from which this vector will copy the components. 
	 */
	final public void copy(final Vec3D other)
	{
		x=other.x;
		y=other.y;
		z=other.z;
	}
	
	/**
	 * returns the length of the vector squared
	 * @return returns the length of the vector squared
	 */
	final public double lengthSquared()
	{
		return x*x+
	   	   y*y+
	   	   z*z;
	}
	
	/**
	 * caps each component of the vector to 1.0 if over 1.0
	 *
	 */
	final public void cap()
	{
		x=x>1.0?1:x;
		y=y>1.0?1:y;
		z=z>1.0?1:z;
	}
	
	/**
	 * assigns this vector to the cross product of this vector and the passed in vector
	 * @param other the other vector to use in the cross product
	 */
	final public void crossProduct(final Vec3D other)
	{
		final double oldx=x;
		final double oldy=y;
        x=y * other.z - z * other.y;
        y=z * other.x - oldx * other.z;
        z=x * other.y - oldy * other.x;
	}
	
	/**
	 * Perform vector multiplication of this vector and the passed vector
	 * @param other the vector to perform multiplication with this vector
	 */
	final public void multiply(final Vec3D other)
	{
		x *= other.x;
        y *= other.y;
        z *= other.z;
	}
	
	/**
	 * Assigns this vector to the vector scaled by the multiplier
	 * @param other the vector to assign this vector to
	 * @param scale multiplier to scale the other vector by
	 */
	final public void copyScale(final Vec3D other,final double scale)
	{
		x=other.x*scale;
		y=other.y*scale;
		z=other.z*scale;
	}

}

tempPixel.position.copy(tempVec3D_3);

This line seems to be the only one in that loop that accesses a huge memory structure.

I think you’re suffering from cache misses.

From the packagename, I assume you’re trying to write a RealTimeRayTracing engine, in which case you really need to think about memory access speeds. Plain Java with lots of objects scattered over lots of native pages doesn’t cut it, performance wise.
If I were you, I’d start converting my data structure to float[]s, accessed by ‘sliding window’ objects, to keep it all managable - they’d be like FloatBuffers, but without the performance quirks. You could ofcourse go even nastier, with sun.misc.Unsafe, with raw pointers, removing the nullpointer/arraybounds checks.


public class FloatData
{
   public FloatData(float[] arr, int off, int len, int stride)
   {
      
   }

   public void select(int index)
   {
      off = index * stride;
   }

   public static void add3(FloatData op1, FloatData op2, FloatData dst)
   {
       int off1 = op1.off;
       int off2 = op2.off;
       int off3 = dst.off;

       dst.arr[off3+0] = op1.arr[off1+0] + op2.arr[off2+0];
       dst.arr[off3+1] = op1.arr[off1+1] + op2.arr[off2+1];
       dst.arr[off3+2] = op1.arr[off1+2] + op2.arr[off2+2];
   }
}

15 fps sounds about what you’d expect from tracing 300k rays. I have a similar performance from a software ray-caster when working with around that many rays.

It might just be that if the only line is

tempPixel=camera.pixels[intTemp3++];

that the compiler optimises out the array lookup (since the values won’t be accessed when this is the only line) and is actually just doing an increment.

Can you do something trivial with tempPixel and see if you still get 190 fps?

Jono: i think you may be correct… i put in a simple post increment to a member variable of the tempPixel and the frame rate drops to ~50fps. That 15 fps i quotes was “not” performing the actual ray tracing, purely creating the ray per pixel. which is quite bad really.

riven: I was afraid you might say that… i had my suspicions about cache misses… bugger… it just made my existing framework pretty much useless… ah well. I will do as you suggest and get rid of my objects and use primitives as much as possible.

thanks for the heads up!