What I did today

Today I’ve made some pull requests on github following hacktoberfest and refactor a little bit and old project, removing bugs and cleaning the code :smiley:

I started working on a icons/sprites-bundle for the heck of it.

Spent a bit more time on navier stokes today. Converted my program to Java/LibGDX. Now running all computations using openCL, it’s about twice as fast now. I’ll be able to get it better with a little more time. Maybe convert it to GLSL to see what happens, I’m confident it’ll be faster that way. No real purpose here really I just kinda want to see how far I can take this. Will add in 3D support later as well.
It’s rendering at 900x900 with the fluid grids being about 250x250. And I’m recording on a laptop. So not awful performance but not spectacular. I expected to see a larger boost from OpenCL, I may have to study it a bit more.

First I show density map, then velocity.

V_EsOl1LtqA

As you can see, my boundary conditions are a little messed up. I’m still working out the kinks from transferring to opencl.

I am frustrated. Too much work and because of that too little time for my hobby projects … I dont think I will/can complete my advent game calendar this year …
But if you like you can try two of the 17 finished games here (TreasureMirror, ChristmasTrain)

@Slyth2727 If you put this in a library and make it available as compute shader, maybe someone (I?) tests it with render to texture in his game engine :stuck_out_tongue:

I got fiber installed a couple of days ago.

Before:

After:

Oh yeah definitely, I’m aiming to code it so it’s completely library independent besides maybe LWJGL or whatever I’m going to use for OpenGL. Maybe have an OpenCL and OpenGL/shader version, though due to the nature of the algorithm shaders would be much faster I think. I’m just using LibGDX to take the data computed by my library and render it for ease of use right now. I’ll make sure to let you know. 3D support is already under way. I’m not even trying to run it at decent speeds until I implement it in GLSL so I can just use 3d textures.

Finished my first ever blog post: Asset Loading in LibGDX
Our website is not yet finished, so don’t judge :wink:

I’ve managed to get myself a software developer apprenticeship! Its not Java though, its C#; but learning Java in my spare time has helped me with the basics of C# as they are fairly similar.

Once again I find myself using @Roquen’s Grab Bag of curiosities in earnest for a project.
A fantastic resource and many thanks again.

I’m using LCG64 to get a reliably random and speedy sequence of numbers needed to generate some terrain for Battledroid. @theagentd and I will reveal all in a few weeks. (Yes, he is once again enslaved).

Cas :slight_smile:

Non-Java, but programming related: I received my first paycheck yesterday from a startup I have been working with (with request not to deposit it until tomorrow). Not much money, but they are basically paying me to learn as I go and giving me valuable work experience. Project involves VR (using JavaScript / AFrame library).

We had a meeting yesterday where I had to show work done so far. Fortunately, I think I chose well and got a few key things working very smoothly, rather than trying to do too much. For example, an animation (fade to black) for transitioning between “rooms” that people neither noticed or remarked about, as it was glitch free (not easy to achieve).

Museum staff felt good enough about the progress that my boss was approached about working on a grant for an additional project, which pleased him immensely.

I want to comment on JavaScript and on adding operator overloading to Java.

With physical work, (e.g., involving sacks of cement, say), the size of the bags have evolved to something that an good strong male worker can hoist around (I’m thinking of the 100# bags I dealt with, years ago). Now, if the work force were of people of smaller physical stature (e.g., included many women), it would be more efficient if the bags were sized to a slightly lower maximum. Does this make sense?

The point I want to make is that there is a mental correlation: working memory (similar but not to be confused with short-term memory). Young brains can juggle more items at one time than older brains. Maybe I can only track 5 or 6 thoughts at the same time, rather than 8 or 9 like a twenty-something programmer. I think it must be tempting to push the limits of what one can juggle. But having too much stuff to keep track of at the same time ends up being counter-productive. Operator overloading, it seems to me, adds to the load on human “working memory.” A better form for the leveraging of complexity is via “chunking,” which I take as encapsulating or grouping functionality in a single entity (e.g., a class, a function, a subroutine).

I don’t often see discussions about the merit of languages talk much about the human “working-memory” demands of that language.

JavaScript makes pretty high demands, it seems to me. I’m still not able to easily tell something that is usually pretty obvious in Java: a given variable’s scope and whether its use in a statement is subject to closure or not. Maybe you’ve seen the articles on “favorite interview questions” as examples. I can link one if needed. I truly hope Java doesn’t go down the path of adding functionality that increases the human working-memory load.

Life’s been tough on me the last few weeks, especially the last few days, so I decided to do some extra fun coding this weekend.

3-4 years ago I made some threads about an extreme scale space game with realistic Newtonian physics. The game would require simulating a huge number of objects affected by gravity, with extreme speed collision detection. I am talking 100k+ ships, each orbiting a planet at 10km/second, with accurate collision detection. The technical challenges are enormous. After some spitballing here on JGO, I ended up implementing a test program using fixed-precision values (64-bit longs) to represent positions and velocities to get a constant amount of precision regardless of distance from origin. Simple circle-based collision detection was handled by sorting the ships along the X-axis, then checking collisions only for ships that overlap along the X-axis. The whole thing was completely multi-threaded, and I even tried out Riven’s mapped struct library to help with cache locality. Even sorting was multithreaded using my home-made parallel insertion sort algorithm, tailor-made for almost-sorted data sets (the order along the X-axis did not change very quickly). It scaled well with more cores, but was still very heavy for my poor i7.

I realized that the only way to get decent performance for this problem on a server would be to run the physics simulation on the GPU. With a magnitude higher performance and bandwidth, the GPU should be able to easily beat this result as long as the right algorithms are used. The physics simulation is easy enough as it’s an embarrassingly parallel problem and fits perfectly for the GPU. The collision detection (sorting + neighbor check) is a whole different game. GPU sorting is NOT a fun topic, at least if you ask me. The go-to algorithm for this is a parallel GPU radix sort, but with 64-bit keys that’s very expensive. Just like my parallel insertion sort algorithm took advantage of the almost-sorted nature of the sorting, I needed something like that that could run on the GPU as well. That’s when I stumbled upon a simple GPU selection sort algorithm.

The idea is simple. For each element, loop over the entire array of elements to sort. Calculate how many elements that should be in front of this element. You now know the new index of your element, so move it directly to that index. Obviously, this is O(n^2), so it doesn’t scale too well. However, the raw power of the GPU can compensate for that to some extent. 451024 = 46 080 elements can be sorted in ~60FPS, regardless of how sorted the array is. By using shared memory as a cache, performance almost triples to 160 FPS, allowing me to sort 801024 = 81 920 elements at 60 FPS. Still not fast enough. Anything above 200k elements runs a big risk of causing the driver to time out and restart…

Enter block-based selection sort for almost sorted data-sets! The idea is to split the list up into blocks of 256 elements, then calculate the bounds of the values of each block. This allows us to skip entire blocks of 256 values if the block doesn’t intersect with the current block we’re processing. Most likely, only the blocks in the immediate vicinity of each block needs to be taken into consideration when sorting, while the rest of the blocks can be skimmed over. Obviously, this makes the data dependent, and the worst case is still the same as vanilla GPU selection sort if all blocks intersect with each other (which is essentially guaranteed for a list of completely random values). However, for almost sorted data sets this is magnitudes faster!

To simulate an almost sorted data-set, an array is filled with elements like this:

for(int i = 0; i < NUM_KEYS; i++){
	data.putLong(i*8, i + r.nextInt(1000));
}

This gives us an almost sorted array with quite a lot of elements with the exact same value, to test the robustness of the sort. The block-based selection sort algorithm is able to sort a 2048*1024 = 2 097 152 element list… at 75 FPS, way over the target of 100 000. It’s time to implement a real physics simulation based on this!

Let’s define the test scenario. 1024*1024 = 1 048 576 ships are in perfect circular orbits around the earth. The orbit heights range from low earth orbit (International Space Station height) to geosynchronous orbit. Approximately half of the ships are orbiting clockwise, the other half counterclockwise. The size of the earth, the mass, the gravity calculations, etc are physically accurate and based on real-life measurements.

Going back to my original threaded CPU implementation, it really can’t handle one million ships very well. Just the physics simulation of the ships takes 20.43ms, and sorting another 18.75ms. Collision detection then takes another 10.16ms.

The compute shader implementation is a LOT faster. Physics calculations take only 0.27ms, calculating block bounds another 0.1ms and finally sorting takes 2.07ms. I have not yet implemented the final collision detection pass, but I have no reason to expect it to be inefficient on the GPU, so I’m optimistic about the final performance of the GPU implementation.

Each ship is drawn as a point. The color depends on the current index in the list of the ship, so the perfect gradient means that the list is perfectly sorted along the X-axis. 303 FPS, with rendering taking up 0.61ms, 370 FPS without rendering.

I await the day when my CPU also has 1000 cores. Surely not too long now.

Cas :slight_smile:

@theagentd

How “realistic” does it have to be, though? Wouldn’t something like the Barnes-Hut algorithm be “good enough” for most cases?

http://arborjs.org/docs/barnes-hut


http://www.cs.princeton.edu/courses/archive/fall03/cs126/assignments/barnes-hut.html

@jonjava

It’s not an N-body simulation. Celestial bodies (stars, planets, moons) affect each other, but ships are only pulled by celestial bodies. Ships don’t pull each other either.

@theagentd

Ah, so the scenario is essentially 1 source of gravity where the precise positions of the ships/particles affected by it comes to play? But then why is the calculations so taxing? What makes this so different than anything else? The precision? The magnitude/amount of bodies? Why wouldn’t e.g. a QuadTree based solution work?

Simulating the physics takes 0.27 ms for ~1 million ship, and this is GPU bandwidth limited, so I an have up to 8 sources of gravity before I get any drop in performance. If it’s just the simulation, it can easily be done for over 10 million ships. The problem is the collision detection really. Hierarchical data structures are usually not very efficient on the GPU, and constructing them on the CPU would require reading the data back, constructing the quad tree, then uploading it again to the GPU, which is gonna be too slow. In addition, actually querying the quad tree on the GPU will be very slow as well; GPUs can’t do recursion and computations happen in lockstep in workgroups, so any kind of branching or uneven looping will be very inefficient. It’s generally a better idea to use a more fixed data structure, like a grid instead, but that’s a bad match in this case. The large scale of the world, the extremely fast speed of the ships and the fact that ships will very likely be very clumped up into fleets means that even a uniform grid will be way too slow.

The idea of sorting the ships along one axis and checking for overlap of their swept positions (basically treating each ship as a line from its previous position to its current position) was inspired by Box2D’s broadphase actually. I concluded that sorting was a simpler problem to solve than creating and maintaining a spatial data structure (especially on the GPU), but after testing it out more I’m not sure it’s a good solution in this case. For a fleet orbiting in close formation, there’s a huge spike in sorting cost when the orbit reaches the leftmost and rightmost edges of the orbit when the order of the entire fleet reverses. There are also problems when two large fleets, one moving left and the other right) cross each other, again due to the two fleets first intermixing and then swapping positions in the list once they’ve crossed… Finally, there’s a huge problem with just fleets travelling around together. A fleet of 10 000 ships moving very quickly together will have overlapping swept positions, so all 10 000 ships will be collision tested against each other.

I got a lot of thoughts on this problem, so if you want to have more of a discussion about this, I’d love to exchange ideas and thoughts on this through some kind of chat instead.

A very insightful clarification covering several levels of analysis, thank you~

My time is limited as of late, not that I’d necessarily be able to contribute anything meaningful to the discussion anyway. But it’s an interesting topic I don’t think anyone here wouldn’t mind seeing more posts of in the future.

Messed around with voronoi partitioning today.

2AFcl7d8p98

Just been watching the JDD 2017 Recap video. Pretty pleased with this! :smiley: (a couple of weeks back I did my first conference keynote talk / performance in Krakow - “Write Now, Run Anytime” - about realtime hot code reloading in Java, with lots of audio and visual examples)

w7f9N-laVW0