What I did today

I recently found System.arraycopy() in my quest to make my level editor faster (I just implemented batch rendering and building vbos with 1.5 billion floats takes a while…) and I was going to tell you that I increased my build method’s speed by 26x (~175,500ms to ~6,600ms).

Which is still slow, but pretty good none the less, then I realized that I was reading my profiler wrong and I actually decreased the method time to ~460ms!!! 381X increase in performance! It’s pretty basic stuff, but I find it pretty cool so I figured that I’d share more.

Here’s what I was doing initially:

		for(int i = 0; i < m_new_terrain_objects.size(); i++)
		{
			//Get the object
			TerrainObject object = m_new_terrain_objects.get(i);
			//Set the vertex offset for this object
			object.setBatchVertexOffset(m_vertex_array.length);
			//copy the vertex array from the object
			float[] src_vertices = createInterleavedArray(object.getTransform(), object.getMesh().getVertices());
			//build a temp array of the current vertex size plus the new vertex array size,
			//so that the new info can be added
			float[] temp_array = new float[m_vertex_array.length + src_vertices.length];
			//now copy the current vertex data into the temp array
			System.arraycopy(m_vertex_array, 0, temp_array, 0, m_vertex_array.length);
			//now copy the new object's data into the temp array 
			//at the proper offset (the length of the current vertex array)
			System.arraycopy(src_vertices, 0, temp_array, m_vertex_array.length, src_vertices.length);
			//now update the current vertex array to reflect the changes
			m_vertex_array = temp_array;
		}

temporary array copies, building a new array every time, and basically utter nonsense. I thought a little more about it and I refactored:

int size = 0;
		for(int i = 0; i < m_new_terrain_objects.size(); i++)
		{
			size += m_new_terrain_objects.get(i).getMesh().getVertices().length * (Vertex.SIZE - 4);
		}

		float[] temp_array = new float[m_vertex_array.length + size];
		System.arraycopy(m_vertex_array, 0, temp_array, 0, m_vertex_array.length);
		int current_index = m_vertex_array.length;
		for(int i = 0; i < m_new_terrain_objects.size(); i++)
		{
			TerrainObject object = m_new_terrain_objects.get(i);
			object.setBatchVertexOffset(current_index);
			float[] src_vertices = createInterleavedArray(object.getTransform(), object.getMesh().getVertices());
			System.arraycopy(src_vertices, 0, temp_array, current_index, src_vertices.length);
			current_index += src_vertices.length;
		}
		m_vertex_array = temp_array;

There was no real way for me to get around figuring out the final array size so I have to touch every new object and find how many vertices there will be. Then I build one temp array to hold the final data and proceed to fill it up with all them floats.

Now I do have another function (createInterleavedArray()) that still builds a new array of the mesh data that I could just store, but I prefer it in a more readable format, the batched objects are only built in the level editor, and I already have good enough performance in this area.

What about [icode]m_vertex_array = Arrays.copyOf(m_vertex_array, m_vertex_array.length + size);[/icode] and then ditch temp_array?

Started a new job doing fairly repetitive work, but making very nice money and it’s not even that hard! I’ve learned to always have connections, my girlfriend’s step-father offered me this job and now I’m doing really well.

Also started on an Android app, having lots of fun in XML!

@opiop65 Pssst… Android Studio :point:

@BurntPizza Good thinking.

I’m already using it :wink: The preview renderer doesn’t play nice with custom views though, so I can’t do much with the drag and drop editor :confused:

The rest of it is very cool though! Love the Gradle integration.

I am working on a prototype for a game:

vClIgQkx_sI

And started to creating one of the three main characters:

New solve times for the 3x3 cube… I’m getting slightly better, haven’t really worked on it recently. My personal best is now 23 seconds :slight_smile: so close to sub 20!

http://puu.sh/hqv6m/fc29b65e5b.png

Related to programming, I’m going to states with my game in about a week… I really need to work on it some more.

Hey, as I feel like I have achieved something today, I’m going to get in on this post!

Today I have made a cracking start on a game (no, really). I call it Lost Luggage.

I can’t do graphics so it’s all textual right now, here’s a screenie:

The aim is to get luggage on the left to the correct aircraft on the right :wink: The part I was most pleased about it that I finally got to level 3 after 20 minutes lol

            I've kept on updating my game project called "Project Eclipse" with the help from multiple people including BurntPizza! Especially BurntPizza. I am currently trying to make a Parallax star field that doesn't leave nothingness when they get randomly put on the screen after getting deleted from getting out of bounds. I'm going to have to figure out how to do some pac-man style object wrapping so I generate the stars once and done. All the assets including sound and music is made by me! :D

tZseJnzQx8Q

Going for my first round of interviews for my internship on Tuesday, so I’ve been studying and trying really hard to focus on what I actually need to learn, instead of working on my Android app which I’m really excited about!

Wrote an IRC bot for fun. It is also a mini IRC client that I can use it to chat by sending manual commands by typing into the command prompt window.

Features right now are some basic commands, a calculator (based on Nashorn), SASL authentication, SSL connection. Had any more ideas that’d be cool to implement? Please tell me.

Played around a little with making a Minecraft ‘hacked client’. Even though the code is horrendous, it’s still fun to play around with it. Here’s what I’ve got after messing with it after a couple days (just finished the basic rear view mirror thing).

http://puu.sh/htwzA/7077b3c478.jpg

Sooner or later someone’s going to sue you or something… Keep it up :smiley:

I started messing around with a music tracker called Sunvox. I’ve dealt with trackers before, but they’ve always been rather unwieldly for me. I’m kinda liking Sunvox so far, though. I’ve also fixed a number of issues regarding the Behavior System I wrote for my own little framework (help my workflow a bit by having some things already done), but it’s still far from perfect.

I actually finished a game :slight_smile:
May release it here soon. I’m content. It’s also 1 am.

Made more music, this one being a sort of town/water dungeon theme for RPGs. It’s free to use (just gotta give credit) so there’s that.
I also started thinking about a component entity system for a game I’m making, because it’s one of the most efficient ways I could think of that would also be easy for modding. Modded UI might be slightly tricky, but me and my partner might be able to come up with a solution.

I rewrote some of my mapped VBO managing code and improved the two slowest renderers in Insomnia. Before, every view (camera/shadow map) had its own set of VBOs for what was visible for that view. Each view frustum culled terrain and 3D models, constructed a list of instances that were visible, mapped a buffer, placed the data in the buffer, unmapped the buffer, then rendered itself. All views like these were calculated in parallel to the extent that this was possible (cull views and construct lists in parallel, map buffers in OpenGL thread, data uploaded in parallel, buffers unmapped in OpenGL thread, render everything in OpenGL thread). This had a number of problems that were quite difficult to detect.

  1. Mapping a lot of buffers is slow as hell. In my stress test, I had almost 1000 point lights visible with 6 shadow maps each, resulting in over 10 000 VBOs being mapped each frame. Persistent mapped buffers “fixed” this as the map operation could be avoided, but that doesn’t help OGL3 GPUs, causing a shitload of driver overhead. Mapping a lot of buffers also kills the driver’s internal multithreading, as each map operation causes synchronization inside the driver.

  2. Due to how my code was structured, each pass needed to fetch render data (visibility lists, VBO) of each view from a tiny little HashMap. It turned out that the simple map and unmap passes run on the OpenGL thread were locking up the OpenGL thread for much longer than they should be due to the overhead of having to fetch the render data.

  3. My engine can seamlessly switch between using unsynchronized VBOs and persistently mapped VBOs. It turns out that glMapBufferRange()'s ability to reuse the previous ByteBuffer instance is very limited.

[quote]The old_buffer argument can be null, in which case a new ByteBuffer will be created, pointing to the returned memory. If old_buffer is non-null, it will be returned if it points to the same mapped memory and has the same capacity as the buffer object, otherwise a new ByteBuffer is created.
[/quote]
In other words, it can only reuse the previous ByteBuffer if you map the exact same number of bytes each frame and the driver gives you the exact same memory address (which it should unless you reallocate the VBO). When using unsynchronized VBOs, the number of visible objects for each view changed pretty much every frame, causing new ByteBuffers to be allocated each frame. In my stress test, this amounted to over 10MBs of garbage per second of ByteBuffers, Cleaners and Deallocators.

The solution was pretty simple, but required some rewriting. I modified my renderers to use a shared VBO system, so instead of each view mapping their own buffers they would “reserve” a part of the VBO instead. Since I’m only mapping a handful of VBOs each frame now, the driver overhead is minimal. Fewer mapped VBOs also means fewer ByteBuffers created each frame. In addition, the mapping and unmapping code for unsynchronized VBOs does not need to query the render data of each view anymore since it only needs to know the total amount of data needed, removing a lot of HashMap overhead. Although I expected this to yield a noticeable performance increase when using unsychronized VBOs, I did not expect unsychronized VBOs to become as fast as persistent VBOs. In addition, both unsychronized and persistent VBOs are significantly faster than before.

Note: Threading in the below table refers to the Threaded Optimization setting in the Nvidia Control Panel, which controls driver multithreading. Both Intel and AMD also have multithreaded drivers, but do not allow the user to override the driver’s automatic selection.

[tr][td]Technique[/td][td]Old FPS[/td][td]New FPS[/td][td]% improvement[/td][/tr]
[tr][td]Unsynchronized, threading off[/td][td]34 FPS[/td][td]52 FPS[/td][td]53%[/td][/tr]
[tr][td]Unsynchronized, threading on[/td][td]19 FPS[/td][td]62 FPS[/td][td]226%[/td][/tr]
[tr][td]Persistent, threading off[/td][td]46 FPS[/td][td]53 FPS[/td][td]15%[/td][/tr]
[tr][td]Persistent, threading on[/td][td]57 FPS[/td][td]64 FPS[/td][td]12%[/td][/tr]

This is mostly an optimization for computers with older GPUs that don’t support persistent VBOs, where the improvement is 82% (34 FPS --> 62 FPS), but even computers with GPUs that support persistent buffers got a 12% increase from the improved parallelism due to the reduced HashMap overhead. Using a single thread in Insomnia and disabling threaded optimization so it only uses 1 thread, I get 21 FPS. With 8 threads on my Hyperthreaded quad core and threaded optimization, I get 64 FPS = 3.05x scaling.

Finished the interview process for my internship, I’m now working on a little project that will determine if they will hire me or not! Also had my senior prom, it was actually kind of fun and being dressed up really gives you a confidence boost. It also helps when you have a pretty girlfriend :wink:

Lost my summer job and the apartment I plan on living in in a few a weeks had break-in.