What I did today

Spent way too much time trying to make a super simple animation :stuck_out_tongue:
Iā€™m not artsy :clue:

I taught it to fly :o

This was not really something which I did, apart from accepting yet another pull request for JOML which fixes a bug introduced by trusting Eclipseā€™s method invocation inline refactor. Ever since I did some inlining of some methods in JOMLā€™s classes, bug reports and pull requests are pouring in which fix obvious errors.
And the reason in all such cases was the use of Eclipseā€™s (Shift+Alt+I) method call inlining which I did to many methods in order to reduce the call depth.
Take for example this simplified class:



public class Vector2f {
  public float x, y;
  public Vector2f perpendicular() {
    return this.set(y, -x);
  }
  private Vector2f set(float x, float y) {
    this.x = x;
    this.y = y;
    return this;
  }
}

Now inline the this.set(y, -x) invocation. This will result in this new and totally buggy implementation:


  public Vector2f perpendicular() {
    this.x = y; // <- set this.x
    this.y = -x; // <- read the updated this.x, which is this.y now!
    return this;
  }

So, Eclipseā€™s method call inlining did not take into account that Javaā€™s call convention is by-value, first evaluating the call arguments and then using them to set the fields. So, in order to inline this call, we actually need a temporary variable here!

I am not sure there is much gain in inlining methods with java. Modern processors and/or compilers are so performing, that one more indirection will not affect much program speed.

The goal here was to keep the call depth as shallow as possible, to not reach the JVMā€™s default inlining threshold in client code, which would likely have avoided escape analysis to identify that the called vector instance does not escape and therefore allowing scalar replacement to avoid Vector2f allocations altogether in client code.
It is true that there are two thresholds to consider here:

  • C2 inline depth threshold; and:
  • C2 frequent/hot method bytecode length
    which are both at play to in the end avoid escape analysis and scalar replacement in client code. However, in this case with a very small method, it is more likely to hit the call depth threshold first, which made manually inlining this more worthwhile to make escape analysis in client code more likely to happen.
    And that actually makes a HUGE difference in performance.
    There is much more to inlining in modern JVMs and the further optimizations that it enables than initially meets the eye. But you are right, method call dispatch performance was not the concern here.

And this time working on a collision shape editor using jMonkeyEngine :slight_smile:

X2rCEStGD1U

I made a kitty for our flying buddy to play with :stuck_out_tongue:

My model editor:

Imgui Docking branch available for testing

I would have posted in the imgui thread, but apparently if you dont write often in a thread, you cant reply anymoreā€¦

crap policy for tools area, I hope we can move the forum asap

Good one, @Gjallar, figuring out how to reopen the thread for @elect!
Thanks.

What have I done today (so far, as long as I am here)?

An extra credit lab assignment pertaining iptables on an Ubuntu VM. (Taking a class on CyberSecurity for Web Applications at SF City College. Finals are next week, and I probably need mucho extra credit to ensure my ā€œAā€ if the final is at all challenging.)

A tutorial on Docker. Started [urlhttps://github.com/docker/labs/blob/master/beginner/chapters/alpine.md]here[/url], and am half way through part 2. This is quality writing as far as I am concerned (from the beginnerā€™s perspective). Nice walk-throughs and explanations AND the author takes the time to clean up afterwards. So many tutorials: they tell you how to DO things but not how to STOP or UNDO or cleanup afterwards.

Monkey brains: always focused on the Yang and forgetting about the Yin.

Speaking of brains, if I have any left by the end of the day, Iā€™ll get back to my current JavaFX program/problem. Saving the best for last.

The dudeā€™s got a stick, but youā€™ve got a laser :wink:

Hereā€™s an earlier animation test with black outlines, let me know if it looks better than the white outlines

@buddyBro I would prefer black as it gives the model more perceived depth and doesnā€™t distract from the base colors as much. This also highly depends on the background color IMO.

Fiddling with my attempt at grass. At the moment, Iā€™m not sure which directinon to pursue? :-\ :clue: ???

A billboards approach looks weird when the cammera is looking down

And a volume approach looks weird when the camera is looking horizontal.

Thereā€™s room for tweaking both approaches, so this is less of a ā€˜which looks better in their current stateā€™, and more of a which would be worth pursuing furtherā€™ kind of dilema.
At the moment, Iā€™m leaning towards the billboard approach, though itā€™s can only render so many blades due to performance; unlike the volume sliced implementation.

Meanwhile, also changed the outlines to black and added a background gradiant.

Edit, Replying here to avoid spamming this topic:

Second approach with transparency gradient toward the top?

The last gif (6th) does have an alpha gradiant. Each layer has alpha .2, I believe; bottom layers look more opaque because theyā€™re compouned with the layers above. The transparancy is apparent when looking at the grass in front of the blue character. Is that what youā€™re suggesting, or am I misunderstanding?

Second approach with transparency gradient toward the top?

Testing character pathfinding & physics

Character rotation on waypoints is not smoothed out yet and no animation is applied to the character.

JaxeD2VFfUE

Iā€™m failing horribly at getting things to feel and look right. :ā€™(
Reminds of the time when I was tasked with creating a visual debug tool. Despite how many hours I tried to make it intuitive and elegant, it was just off. Then I showed it to our UX; after a quick discussion, he drew up a mock in 10 minutes. It was just minor tweaks and rearrangements and took a few hours to implement the changes, but the improvement was night and day.
Perhaps I should pause coding and look into some sort of tutorial on how to make things aesthetically pleasing.

@buddyBro The only real nasty sin i see there are the status bars being too small, very close to each other and lacking in contrast. There is no way you could keep track of them while in combat. And even worse is that they have no identification symbol and no special shape that tells the player what the individual bars mean.

Edit: Have a look at Left 4 dead 2s GUI for a real positive example that would also work for your game, i think you can basically steal their layout and be good.

Implemented ray march AO for fun, not optimized in any way. Running at full res 16 samples with only TAA for denoising, can be improved.

No AO, very basic SSAO (that is barely noticeable), ray march AO.

Gave the current Panama vectorIntrinsics branch another shot, and it looks like itā€™s finally getting somewhere promising:
Simple test code:


import static jdk.incubator.vector.FloatVector.SPECIES_128;
import static jdk.incubator.vector.FloatVector.fromArray;
public class Matrix4fv {
    private final float[] es = new float[16];
    public Matrix4fv add(Matrix4fv other) {
        for (int i = 0; i < 4; i++) {
            fromArray(SPECIES_128, es, i<<2)
                    .add(fromArray(SPECIES_128, other.es, i<<2))
                    .intoArray(es, i<<2);
        }
        return this;
    }
}

and the current scalar version for comparison:


public class Matrix4f {
    float m00, m01, m02, m03;
    float m10, m11, m12, m13;
    float m20, m21, m22, m23;
    float m30, m31, m32, m33;
    public Matrix4f add(Matrix4f other) {
        m00 = m00 + other.m00;
        m01 = m01 + other.m01;
        m02 = m02 + other.m02;
        m03 = m03 + other.m03;
        m10 = m10 + other.m10;
        m11 = m11 + other.m11;
        m12 = m12 + other.m12;
        m13 = m13 + other.m13;
        m20 = m20 + other.m20;
        m21 = m21 + other.m21;
        m22 = m22 + other.m22;
        m23 = m23 + other.m23;
        m30 = m30 + other.m30;
        m31 = m31 + other.m31;
        m32 = m32 + other.m32;
        m33 = m33 + other.m33;
        return this;
    }
}

JMH benchmark:


import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 8, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class Bench {
    private final Matrix4f m4 = new Matrix4f();
    private final Matrix4fv m4v = new Matrix4fv();
    @Benchmark
    public void vector() {
        m4v.add(m4v);
    }
    @Benchmark
    public void scalar() {
        m4.add(m4);
    }
    public static void main(String[] args) throws Exception {
        new Runner(new OptionsBuilder()
                .include(Bench.class.getName())
                .forks(1)
                .jvmArgsAppend("--add-modules=jdk.incubator.vector",
                               "-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0")
                .build()).run();
    }
}

Results:


Benchmark     Mode  Cnt  Score   Error  Units
Bench.scalar  avgt    8  5.421 Ā± 0.029  ns/op
Bench.vector  avgt    8  2.539 Ā± 0.025  ns/op

Oooh, not bad - a doubling in performance is never to be sniffed at. How long before it reaches the JDK proper?

Cas :slight_smile:

The two projects (Panama, Valhalla) and their sub-projects/goals (Panama: Vector API + native interop; Valhalla: value/inline types + generics specialization) are so intertwined and likely want to integrate with one another well and they are constantly changing things (especially in Panama) which are so very deep at the core bottom of things and they remove/add Vector API methods constantly, so the API is anywhere except stable, let alone optimizedā€¦ that I personally donā€™t expect general availability of it within the next three years or so, also given the time span they already work on it.
Also, in order to shape the future JOML 2.0 I need to know whether Panama Vector API will hit the shelves before inline types do or whether both happen at the same time, making the whole ā€œoperation modifies thisā€ semantics then completely obsolete and JOML objects can finally behave as mathematical identity-less value objects, as they always should have. But if Vector API will be available well before inline types are, then it makes sense to add another matrix/vector class implementation using float[16] arrays as backing store and port all operations to the Vector API.

EDIT: Now, when doing anything a bit more complicated than a simple add(), then performance breaks down horribly :slight_smile:
This:


/**
 * https://stackoverflow.com/questions/18499971/efficient-4x4-matrix-multiplication-c-vs-assembly#answer-18508113
 */
public Matrix4fv mul(Matrix4fv o) {
    FloatVector row1 = FloatVector.fromArray(SPECIES_128, o.es, 0);
    FloatVector row2 = FloatVector.fromArray(SPECIES_128, o.es, 4);
    FloatVector row3 = FloatVector.fromArray(SPECIES_128, o.es, 8);
    FloatVector row4 = FloatVector.fromArray(SPECIES_128, o.es, 12);
    for (int i = 0; i < 4; i++) {
        FloatVector r = FloatVector.fromArray(SPECIES_128, es, 4*i);
        FloatVector brod1 = FloatVector.broadcast(SPECIES_128, r.lane(0));
        FloatVector brod2 = FloatVector.broadcast(SPECIES_128, r.lane(1));
        FloatVector brod3 = FloatVector.broadcast(SPECIES_128, r.lane(2));
        FloatVector brod4 = FloatVector.broadcast(SPECIES_128, r.lane(3));
        brod1.fma(row1, brod2.fma(row2, brod3.fma(row3, brod4.mul(row4)))).intoArray(es, 4*i);
    }
    return this;
}

takes 200 ns/op, whereas the straight-forward scalar version in JOML takes ~14ns/opā€¦

EDIT2: Got it down to ~22 ns/op with:


private static final VectorShuffle<Float> s0 = VectorShuffle.fromValues(SPECIES_128, 0, 0, 0, 0);
private static final VectorShuffle<Float> s1 = VectorShuffle.fromValues(SPECIES_128, 1, 1, 1, 1);
private static final VectorShuffle<Float> s2 = VectorShuffle.fromValues(SPECIES_128, 2, 2, 2, 2);
private static final VectorShuffle<Float> s3 = VectorShuffle.fromValues(SPECIES_128, 3, 3, 3, 3);
public Matrix4fv mul(Matrix4fv o) {
    FloatVector row1 = FloatVector.fromArray(SPECIES_128, o.es, 0);
    FloatVector row2 = FloatVector.fromArray(SPECIES_128, o.es, 4);
    FloatVector row3 = FloatVector.fromArray(SPECIES_128, o.es, 8);
    FloatVector row4 = FloatVector.fromArray(SPECIES_128, o.es, 12);
    for (int i = 0; i < 4; i++) {
        FloatVector r = FloatVector.fromArray(SPECIES_128, es, 4*i);
        r.rearrange(s0).mul(row1)
         .add(r.rearrange(s1).mul(row2))
         .add(r.rearrange(s2).mul(row3))
         .add(r.rearrange(s3).mul(row4))
         .intoArray(es, 4*i);
    }
    return this;
}

This is sooo fragile and finnicky. So rearrange(VectorShuffle) compiles to alot faster code than broadcast(lane()). Also now inlining s0-s3 will get us to ~100ns/op again.