What I did today

mudlee · September 30, 2019, 7:30pm

Here it is https://drive.google.com/file/d/1YKeQBPKglP0nARTYqaRzsTT8z5nSYh05/view?usp=drivesdk

Today I managed to refactor my camera into something more general (Camera interface + AbstractCamera parent), and now with a small code change I now have an RPGCamera that has the right view angle and movement. Initally I wanted to create a camera arm that holds the camera itself. It tought its easier to have a camera rotated and the arm moved, but I use ECS amd when I started to think on parent-child relationship between transforms I realised that ECS is not meant for this So I ended up with a Camera refactor.

KaiHH · September 30, 2019, 10:06pm

Today: Greedy Meshing

Up until now I only used actual cube and cuboid primitives to ray trace the scene, but a triangle mesh allows to:

do a raster pre-pass to compute “primary rays” giving the hit point position and normal to start any secondary rays from that point
use Nvidia Vulkan Ray Tracing, which gives a massive performance boost compared to a non-RT compute shader on Turing hardware!

Greedy Meshing Java Code here: https://github.com/LWJGL/lwjgl3-demos/blob/master/src/org/lwjgl/demo/util/GreedyMeshing.java

KaiHH · October 2, 2019, 12:11am

Today I quickly threw a simplex-noise-generated and greedy-meshed chunk onto NV_ray_tracing with a simple path tracing shader and got this:

LhN3w46kKmc

(16 samples per pixel, 4 bounces, 1440p, 60Hz)
One might think this is pre-rendered in e.g. Blender/Cycles, but it’s actually realtime with a smooth camera rotation animation.

KaiHH · October 5, 2019, 4:09pm

Today I implemented hybrid ray tracing (rasterization + ray tracing) in Vulkan, so that the first ray (starting from the eye) is not ray traced but actually rasterized/rendered as you would normally do, because rasterization still is much faster.
Normal output (rasterized):

Depth output (rasterized - linearized depth buffer):

1 sample-per-pixel 1 bounce (ray traced):

Code: https://github.com/LWJGL/lwjgl3-demos/blob/master/src/org/lwjgl/demo/vulkan/NvRayTracingHybridExample.java

SHC · October 5, 2019, 4:50pm

I started learning Spring framework. P.S.: That syntax theme is my own port of the Ayu Dark colour scheme from VIM.

KaiHH · October 7, 2019, 8:42pm

Blue noise sampling is so superior to white noise for the second bounce ray where the 2D input/sample vector to the blue noise function/image in screen space maps linear to the blue noise function/image domain.
Essentially, the blue noise function/image is sampled based on the X/Y screen space dimensions and the sample position is shifted by a random/hash function based on the bounce index and frame index (or frame/elapsed time) as input. This is what is commonly called “Cranley-Patterson rotation” in the literature.

White noise:

And blue noise:

The blue noise pattern is very amenable to low-pass spatial filtering (such as with a Guassian filter) because there is very little low frequency noise.
Java: https://github.com/LWJGL/lwjgl3-demos/blob/master/src/org/lwjgl/demo/vulkan/NvRayTracingHybridExample.java
GLSL: https://github.com/LWJGL/lwjgl3-demos/blob/master/res/org/lwjgl/demo/vulkan/raygen-hybrid.glsl

By the way: This video shows the current state of the art in sample generation (pretty tough stuff!)

EDIT: Images with 4 samples per pixel (left white noise, right blue noise):

EDIT2:
Also implemented https://eheitzresearch.wordpress.com/762-2/ today in OpenGL/GLSL, giving a big improvement to sample quality over white noise. Comparison (1spp multiple-importance sampling with single rectangular light source and 3 bounces):
(left is white noise, right is blue noise):

(you need to open these images in a separate tab or download them. the browser’s downsampling destroys the effect)

Bottom line: NEVER use white noise (simple rand()) when generating samples!

orange451 · October 13, 2019, 5:00am

Played around with a dark theme in my IDE:

Also re-implemented in-ide project testing, so you don’t HAVE to test in a separate window anymore. This was done to make the program more prototype friendly. If I want to open it up and test how to write something in lua, I will no longer be required to save my work to a temporary file.

I also made undo/redo support

[EDIT]
I forgot to mention @Guerra24 has joined me, and implemented his rendering engine in my game engine. So now it looks a lot prettier

[EDIT2]
Here’s a gif of internal testing

philfrei · October 17, 2019, 8:21pm

Been plugging away at setting up a Linode for my own website, with the goal of migrating fully to it before I have to pay the renewal fee for my current ISP (Oct. 28!).

I’ve managed to install Jetty as my webserver (instead of the recommended Apache) as I want to be able to play around with Servlets and JSP. It now works as a service, and is hosting a replica of my website. That all went fairly easily.

Now diving into learning about hosting a mail server. Linode recommends PostFix and I will likely end up going that route. There’s a Java-based web server project from Apache.org called JAMES. I’m trying to give it a look, too. There is a lot to learn in this realm! And low-grade migraines don’t make studying any easier.

I was telling my brother about the travails of packaging Java projects (where jlinking is involved). He asked if writing a tool to generate the needed shell command steps would be a useful project/product. Thoughts? I’ve been so occupied with learning server skills that I haven’t been able to formulate the task requirements or even think about feasibility, or if this duplicates an existing tool that I don’t know about.

Dave · October 17, 2019, 9:47pm

[spoiler]

[/spoiler]

Seems to be inspired by RobloxStudio, looks really good! Will this engine be publicly available to mess around in?

orange451 · October 17, 2019, 11:31pm

That’s the idea. I really like how Roblox structured their engine, but severely dislike how managed all content is within it. So this engine will be designed similarly but offer much more control to the user. And yes it’ll be available, it’s already on my GitHub, I just don’t want to start advertising it because it’s not in a state that I’m totally comfortable with.

Guerra24 · October 18, 2019, 4:37am

Hey! I’m back, this time working with @orange451. After spending hours investigating and debugging a problem that in the end simply the context wasn’t the correct one when certain code ran I continued my task of integrating the rendering code into the ECS and finally finished the dynamic sky.

https://i.imgur.com/5lI2nGr.gif(image larger than 102400KB)

KaiHH · October 20, 2019, 12:46pm

The past days I’ve been researching and implementing algorithms for efficient chunk/voxel generation and rendering, including:

Iterating chunks from front to back based on distance to camera
Computing “connectivity” between a chunk’s faces (can a face X see a face Y?) with flood filling starting from one of three faces and seeing whether the flood reaches any of the other three faces. This will be used for CPU-side occlusion culling. Article about the algorithm: https://tomcc.github.io/2014/08/31/visibility-1.html
“Dense” Octree based on “mipmapped” arrays for storing chunks
Greedy meshing

I’ve read a few articles about chunk/voxel management and rendering, and concluded with the following design:

Use a “rolling”/sliding 3D array representing the maximum visible world
Each array item stores a tree of chunks as a dense “array-mipmapped” octree (so, no pointers but just array index computations, because we expect all of the octree leaf nodes to be filled eventually, either with a non-empty or empty chunk)
This octree stores 1 chunk per leaf node
The octree is used for combined frustum and occlusion culling as well as determining the next chunk to generate

Especially number 4. requires some thought: What we want is a very efficient algorithm to drive chunk generation. This needs to cooperate with CPU-side frustum and occlusion culling in order to avoid generating chunks which we know will not be visible. It also needs to generate chunks in front-to-back order starting from the camera’s position to generate the best potential occluders first.

About point 1.: The purpose of this rolling array is for the managed chunks to “move” along with the player, so we can always generate chunks around the player. Another alternative that has also been proposed is a simple hashmap, hashed by world position. I’ll go with a rolling array for now.

About the octree: We mainly need a datastructure with spatial hierarchy to accelerate frustum and occlusion culling and efficient k-nearest neighbor queries to determine the chunks to generate and render next. We could also just use an array/grid here, because chunks are basically everywhere and k-nearest neighbor in this case will simply be iterating in a spiral around the player, but: This is only true for initial chunk generation. When a chunk has been generated (and is possibly empty because it only consists of air) we can and should use hierarchical culling instead.
Also, since the CPU-based occlusion culling with the “is face X visible from face Y” algorithm is very conservative and good for an initial estimate of the potentially visible set of chunks. We would really like to combine that with GPU-side Z-buffer hierarchical occlusion culling/queries.

The hierarchy of the rolling array and the contained octrees is also necessary because I’ve not found an efficient way to “slide” an octree along with the player, except removing and re-inserting chunks.

There’s still a lot to do, such as a potentially visible set algorithm for the CPU-side occlusion culling algorithm, which goes like this: Whenever it is determined that a chunk face entered by the view direction exits another face in this same chunk, we want to “narrow” the possible chunks being visited after that by the frustum made by the chunk faces with the current camera frustum. A glimpse of this idea is also demonstrated here: https://tomcc.github.io/frustum_clamping.html

EDIT: What this little Javascript demo doesn’t show, however (because a single frustum is immediately updated after placing a block), is that we don’t have just a single frustum which we narrow and filter chunks by, but we need actually multiple such frusta. Imagine the initial view/camera frustum and one completely opaque chunk rendered in front of the camera blocking the view in the center, whereas on the left and right side of the view, we can see further away into the world. In this case we actually have two frusta by which we filter further visited chunks. Since this can amount to possibly thousands of sub-frusta, an optimal solution to this problem would be an interval tree. We simply compute the left and right end of each frustum along the screen-space X and Y directions by determining the distance of the view frustum’s planes to a chunk. This can either narrow down a single interval or split an interval into two intervals, if the chunk is opaque and culls everything behind it.
EDIT2: I think, I will go with a software rasterization approach for occlusion culling instead.
Here is a simple depth-only rasterizer without vertex attribute interpolation, which I am going to use for CPU-side occlusion culling: https://github.com/LWJGL/lwjgl3-demos/blob/master/src/org/lwjgl/demo/util/Rasterizer.java
It is tailored for my vertex format (unsigned 8-bit vertex positions and 16-bit indices).
Here are two images (the left/first is rasterized with OpenGL showing linear depth, the right/second image is showing the linear depth rasterized with the software rasterizer):

(the difference of both images is exactly black/zero at the actual rasterized triangles)

KaiHH · October 23, 2019, 6:59pm

Spent yesterday and today evening on researching how to more efficiently render a discrete voxel grid from front-to-back (for occlusion culling) and back-to-front (for transparency) without explicitly sorting the voxels by view distance.

There are some papers about a simple slices, rows and columns algorithm which looks at the view vector and sort its components by their absolute lengths and define the nesting and order of three for-loops based on the different cases (vector component lengths, and whether it is directed along the negative or positive half-plane):

“Back-to-Front Display of Voxel-Based Objects” - Gideon Frieder et al., 1985
“A Fast Algorithm to Display Octrees” - Sharat Chandran et al., 2000

This however only works under orthographic projection, as is also mentioned by:

“Improved perspective visibility ordering for object-order volume rendering” - Charl P. Botha, Frits H. Post, 2005

which presents an improvement of an ordering algorithm under perspective projection presented by:

“Object-Order Rendering of Discrete Objects” - J. Edward Swan II, 1997
notably in Chapter “2.3 The Perspective Back-to-Front Visibility Ordering”

So in essence, the slices/rows/columns algorithm can be used but the iteration direction needs to swap when the direction to the voxel swaps the side relative to the vector that is perpendicular to the voxel plane and starts at the eye location. This is also pretty obvious, when imagining the camera looking at a wall of voxels and looking very slightly to the right. When we wanted to render voxels from back to front with the orthographic projection slices/rows/columns-algorithm we would simply iterate the voxels from right to left, since we look slightly to the right. This will be correct as long as the voxels are to the right of the view vector. But when we reach the left-hand side, then under perspective projection, we would render voxels nearest to the viewer first, which would be incorrect.

KaiHH · October 26, 2019, 7:33pm

Today was figuring out how to cheaply combat T-junction issues that arise with greedy meshing when faces share a single edge but not share vertices of that edge, leading to visible and distracting errors when those faces are being rasterized, because interpolated vertex positions are not always covering every pixel on that edge.
Since producing a proper mesh without any T-junctions is complicated and such an algorithm will likely be way slower than greedy meshing, I went for the simple hack of expanding/scaling faces a tiny bit so that those rounding errors will not occur/be visible anymore, increasing the potential for pixel overdraw just a tiny bit at those edges. But having a 100% correct rasterization without any holes in it for occlusion queries is more important.
Here is one of the debug images of using the vertex shader to offset the vertex positions based on the view distance (more precisely, the w component of the clip space vertex for inverse view/z distance to have constant offset in screen-space to avoid errors creeping up in more distant vertices):

In the image I used an exaggerated negative offset to test the view-dependent offset scale calculation.

philfrei · October 30, 2019, 5:38pm

Continuing to learn about servers and hosting.

I’ve managed to get Jetty functional using my domain name on the new Linode system, on port 80.

It’s been…interesting. The Ubuntu repository installs Jetty as a systemd service. But the current Jetty.org documentation has no mention at all of systemd, and assumes one has loaded via wget and set things up in what seems to me to be a much cleaner fashion.

About yesterday: Jetty defaults to run on port 8080 and with Ubuntu, root permissions are required for access to 80. The Jetty.org docs advise using ipchains, iptables or Jetty’s SetUID feature. Another tutorial I found prefers installing either apache or nginx and using one or the other to relay incoming 80 to jetty on 8080. (But the point of my picking jetty was to allow dynamic web serving without requiring a relay via apache, as in apache-tomcat.)

I’ve not been able to find documentation, in the classic sense, for the Ubuntu repository Jetty build. However, there are bread crumbs. A comment in a template start.ini file describes the use AUTHBIND and the assumed location of configuration files.

From this one can also infer something useful about how the Ubuntu Jetty separates the service from the application, in anticipation of application updates.

Today’s task, having installed CertBot and successfully generating keys (for https), I get following for the next step: “You’ll need to install your new certificate in the configuration file for your webserver.” That’s it. :

Jetty.org’s SSH section is NOT an easy read.

KaiHH · November 1, 2019, 4:16pm

Just implemented the SIGGRAPH 2016 paper Real-Time Polygonal-Light Shading with Linearly Transformed Cosines into my OpenGL test scene code:
(image only shows the specular GGX contribution)

The lighting calculation is completely analytic and very cheaply done in the shader without any stochastic elements like Monte Carlo integration - so no noise.
But note the lack of shadows from the table. Eric Heitz has a recent solution for this as well: Combining Analytic Direct Illumination and Stochastic Shadows
The solution of that paper is to calculate the direct lighting analytically (without shadows) and then use stochastic ray tracing to compute the occlusion factor, which is then blurred/denoised. The advantage of doing it this way is that this completely avoids any noise/variance whenever the light is completely visible from the sample point.
So, when having polygonal light sources (such as a rectangle) or basically any light shape for which analytic solutions exist (sphere, ellipsoid, disk, line, …) one would never only just sample the light source area or the solid angle, but use the closed analytic solution and perform stochastic sampling only to compute the amount of shadow (hence the name “shadow ray”).
I just love people like Eric Heitz who contribute to the research on ready applicable real-time rendering techniques.

EDIT: Here is a video showing GGX with varying roughness:

-g1USekNpmU

Here is a very nice explanation of the “Linearly Transformed Cosines” technique: https://blog.magnum.graphics/guest-posts/area-lights-with-ltcs/

mudlee · November 1, 2019, 4:44pm

In the last few free evenings I’ve been working on an RPG camera and the Input system in my framework to be prepared for controlling characters from a starcraft like view. Both are ready, except some Linux problems with mouse clicks: http://forum.lwjgl.org/index.php?topic=6958.0 (please check it if you have time :)).

Soon I hope I have a small video about the progress.

elect · November 1, 2019, 6:05pm

@Kai, ever considered a blog?

All your valuable informations could be stored and accessed much more conveniently

Here they kind of get lost in the numerous pages…

princec · November 1, 2019, 8:31pm

What luck, someone has made a brand new forum in which he might start a thread of his own.

Cas

CommanderKeith · November 2, 2019, 7:40am

Very cool demo video. Can the technique be integrated with normal shadow casting? I note that the table legs do not cast shadows in the video.
Cheers,
Keith