What I did today

buddyBro · April 23, 2020, 5:18pm

Spent way too much time trying to make a super simple animation
I’m not artsy :clue:

buddyBro · April 25, 2020, 6:55pm

I taught it to fly :o

https://i.imgur.com/Pf3eXTm.gif(image larger than 102400KB)

KaiHH · April 28, 2020, 5:54pm

This was not really something which I did, apart from accepting yet another pull request for JOML which fixes a bug introduced by trusting Eclipse’s method invocation inline refactor. Ever since I did some inlining of some methods in JOML’s classes, bug reports and pull requests are pouring in which fix obvious errors.
And the reason in all such cases was the use of Eclipse’s (Shift+Alt+I) method call inlining which I did to many methods in order to reduce the call depth.
Take for example this simplified class:



public class Vector2f {
  public float x, y;
  public Vector2f perpendicular() {
    return this.set(y, -x);
  }
  private Vector2f set(float x, float y) {
    this.x = x;
    this.y = y;
    return this;
  }
}

Now inline the this.set(y, -x) invocation. This will result in this new and totally buggy implementation:


  public Vector2f perpendicular() {
    this.x = y; // <- set this.x
    this.y = -x; // <- read the updated this.x, which is this.y now!
    return this;
  }

So, Eclipse’s method call inlining did not take into account that Java’s call convention is by-value, first evaluating the call arguments and then using them to set the fields. So, in order to inline this call, we actually need a temporary variable here!

yboya · April 28, 2020, 6:35pm

I am not sure there is much gain in inlining methods with java. Modern processors and/or compilers are so performing, that one more indirection will not affect much program speed.

KaiHH · April 28, 2020, 6:46pm

The goal here was to keep the call depth as shallow as possible, to not reach the JVM’s default inlining threshold in client code, which would likely have avoided escape analysis to identify that the called vector instance does not escape and therefore allowing scalar replacement to avoid Vector2f allocations altogether in client code.
It is true that there are two thresholds to consider here:

C2 inline depth threshold; and:
C2 frequent/hot method bytecode length
which are both at play to in the end avoid escape analysis and scalar replacement in client code. However, in this case with a very small method, it is more likely to hit the call depth threshold first, which made manually inlining this more worthwhile to make escape analysis in client code more likely to happen.
And that actually makes a HUGE difference in performance.
There is much more to inlining in modern JVMs and the further optimizations that it enables than initially meets the eye. But you are right, method call dispatch performance was not the concern here.

Ali-RS · April 29, 2020, 10:54am

And this time working on a collision shape editor using jMonkeyEngine

X2rCEStGD1U

buddyBro · April 30, 2020, 10:38pm

I made a kitty for our flying buddy to play with

My model editor:

https://i.imgur.com/bvRKcjk.gif(image larger than 102400KB)

elect · May 4, 2020, 3:43pm

Imgui Docking branch available for testing

I would have posted in the imgui thread, but apparently if you dont write often in a thread, you cant reply anymore…

crap policy for tools area, I hope we can move the forum asap

philfrei · May 5, 2020, 12:12am

Good one, @Gjallar, figuring out how to reopen the thread for @elect!
Thanks.

What have I done today (so far, as long as I am here)?

An extra credit lab assignment pertaining iptables on an Ubuntu VM. (Taking a class on CyberSecurity for Web Applications at SF City College. Finals are next week, and I probably need mucho extra credit to ensure my “A” if the final is at all challenging.)

A tutorial on Docker. Started [urlhttps://github.com/docker/labs/blob/master/beginner/chapters/alpine.md]here[/url], and am half way through part 2. This is quality writing as far as I am concerned (from the beginner’s perspective). Nice walk-throughs and explanations AND the author takes the time to clean up afterwards. So many tutorials: they tell you how to DO things but not how to STOP or UNDO or cleanup afterwards.

Monkey brains: always focused on the Yang and forgetting about the Yin.

Speaking of brains, if I have any left by the end of the day, I’ll get back to my current JavaFX program/problem. Saving the best for last.

buddyBro · May 9, 2020, 8:55pm

The dude’s got a stick, but you’ve got a laser

https://i.imgur.com/YumXqDt.gif(image larger than 102400KB)

Here’s an earlier animation test with black outlines, let me know if it looks better than the white outlines

VaTTeRGeR · May 9, 2020, 9:08pm

@buddyBro I would prefer black as it gives the model more perceived depth and doesn’t distract from the base colors as much. This also highly depends on the background color IMO.

buddyBro · May 12, 2020, 3:26pm

Fiddling with my attempt at grass. At the moment, I’m not sure which directinon to pursue? :-\ :clue: ???

A billboards approach looks weird when the cammera is looking down

https://i.imgur.com/xSoh85g.gif(image larger than 102400KB)

https://i.imgur.com/OJayEZq.gif(image larger than 102400KB)

https://i.imgur.com/sN93k6q.gif(image larger than 102400KB)

And a volume approach looks weird when the camera is looking horizontal.

https://i.imgur.com/OYs1KxE.gif(image larger than 102400KB)

https://i.imgur.com/mg7vSoa.gif(image larger than 102400KB)

https://i.imgur.com/dZMgL70.gif(image larger than 102400KB)

There’s room for tweaking both approaches, so this is less of a ‘which looks better in their current state’, and more of a which would be worth pursuing further’ kind of dilema.
At the moment, I’m leaning towards the billboard approach, though it’s can only render so many blades due to performance; unlike the volume sliced implementation.

Meanwhile, also changed the outlines to black and added a background gradiant.

Edit, Replying here to avoid spamming this topic:

Second approach with transparency gradient toward the top?

The last gif (6th) does have an alpha gradiant. Each layer has alpha .2, I believe; bottom layers look more opaque because they’re compouned with the layers above. The transparancy is apparent when looking at the grass in front of the blue character. Is that what you’re suggesting, or am I misunderstanding?

elect · May 13, 2020, 12:27pm

Second approach with transparency gradient toward the top?

Ali-RS · May 13, 2020, 1:59pm

Testing character pathfinding & physics

Character rotation on waypoints is not smoothed out yet and no animation is applied to the character.

JaxeD2VFfUE

buddyBro · May 15, 2020, 4:00pm

I’m failing horribly at getting things to feel and look right. :’(
Reminds of the time when I was tasked with creating a visual debug tool. Despite how many hours I tried to make it intuitive and elegant, it was just off. Then I showed it to our UX; after a quick discussion, he drew up a mock in 10 minutes. It was just minor tweaks and rearrangements and took a few hours to implement the changes, but the improvement was night and day.
Perhaps I should pause coding and look into some sort of tutorial on how to make things aesthetically pleasing.

https://i.imgur.com/dRHF0Sq.gif(image larger than 102400KB)

VaTTeRGeR · May 15, 2020, 5:47pm

@buddyBro The only real nasty sin i see there are the status bars being too small, very close to each other and lacking in contrast. There is no way you could keep track of them while in combat. And even worse is that they have no identification symbol and no special shape that tells the player what the individual bars mean.

Edit: Have a look at Left 4 dead 2s GUI for a real positive example that would also work for your game, i think you can basically steal their layout and be good.

Guerra24 · May 16, 2020, 2:35am

Implemented ray march AO for fun, not optimized in any way. Running at full res 16 samples with only TAA for denoising, can be improved.

No AO, very basic SSAO (that is barely noticeable), ray march AO.

KaiHH · May 18, 2020, 3:46pm

Gave the current Panama vectorIntrinsics branch another shot, and it looks like it’s finally getting somewhere promising:
Simple test code:


import static jdk.incubator.vector.FloatVector.SPECIES_128;
import static jdk.incubator.vector.FloatVector.fromArray;
public class Matrix4fv {
    private final float[] es = new float[16];
    public Matrix4fv add(Matrix4fv other) {
        for (int i = 0; i < 4; i++) {
            fromArray(SPECIES_128, es, i<<2)
                    .add(fromArray(SPECIES_128, other.es, i<<2))
                    .intoArray(es, i<<2);
        }
        return this;
    }
}

and the current scalar version for comparison:


public class Matrix4f {
    float m00, m01, m02, m03;
    float m10, m11, m12, m13;
    float m20, m21, m22, m23;
    float m30, m31, m32, m33;
    public Matrix4f add(Matrix4f other) {
        m00 = m00 + other.m00;
        m01 = m01 + other.m01;
        m02 = m02 + other.m02;
        m03 = m03 + other.m03;
        m10 = m10 + other.m10;
        m11 = m11 + other.m11;
        m12 = m12 + other.m12;
        m13 = m13 + other.m13;
        m20 = m20 + other.m20;
        m21 = m21 + other.m21;
        m22 = m22 + other.m22;
        m23 = m23 + other.m23;
        m30 = m30 + other.m30;
        m31 = m31 + other.m31;
        m32 = m32 + other.m32;
        m33 = m33 + other.m33;
        return this;
    }
}

JMH benchmark:


import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 8, time = 1000, timeUnit = TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class Bench {
    private final Matrix4f m4 = new Matrix4f();
    private final Matrix4fv m4v = new Matrix4fv();
    @Benchmark
    public void vector() {
        m4v.add(m4v);
    }
    @Benchmark
    public void scalar() {
        m4.add(m4);
    }
    public static void main(String[] args) throws Exception {
        new Runner(new OptionsBuilder()
                .include(Bench.class.getName())
                .forks(1)
                .jvmArgsAppend("--add-modules=jdk.incubator.vector",
                               "-Djdk.incubator.vector.VECTOR_ACCESS_OOB_CHECK=0")
                .build()).run();
    }
}

Results:


Benchmark     Mode  Cnt  Score   Error  Units
Bench.scalar  avgt    8  5.421 ± 0.029  ns/op
Bench.vector  avgt    8  2.539 ± 0.025  ns/op

princec · May 18, 2020, 4:00pm

Oooh, not bad - a doubling in performance is never to be sniffed at. How long before it reaches the JDK proper?

Cas

KaiHH · May 18, 2020, 4:16pm

The two projects (Panama, Valhalla) and their sub-projects/goals (Panama: Vector API + native interop; Valhalla: value/inline types + generics specialization) are so intertwined and likely want to integrate with one another well and they are constantly changing things (especially in Panama) which are so very deep at the core bottom of things and they remove/add Vector API methods constantly, so the API is anywhere except stable, let alone optimized… that I personally don’t expect general availability of it within the next three years or so, also given the time span they already work on it.
Also, in order to shape the future JOML 2.0 I need to know whether Panama Vector API will hit the shelves before inline types do or whether both happen at the same time, making the whole “operation modifies this” semantics then completely obsolete and JOML objects can finally behave as mathematical identity-less value objects, as they always should have. But if Vector API will be available well before inline types are, then it makes sense to add another matrix/vector class implementation using float[16] arrays as backing store and port all operations to the Vector API.

EDIT: Now, when doing anything a bit more complicated than a simple add(), then performance breaks down horribly
This:


/**
 * https://stackoverflow.com/questions/18499971/efficient-4x4-matrix-multiplication-c-vs-assembly#answer-18508113
 */
public Matrix4fv mul(Matrix4fv o) {
    FloatVector row1 = FloatVector.fromArray(SPECIES_128, o.es, 0);
    FloatVector row2 = FloatVector.fromArray(SPECIES_128, o.es, 4);
    FloatVector row3 = FloatVector.fromArray(SPECIES_128, o.es, 8);
    FloatVector row4 = FloatVector.fromArray(SPECIES_128, o.es, 12);
    for (int i = 0; i < 4; i++) {
        FloatVector r = FloatVector.fromArray(SPECIES_128, es, 4*i);
        FloatVector brod1 = FloatVector.broadcast(SPECIES_128, r.lane(0));
        FloatVector brod2 = FloatVector.broadcast(SPECIES_128, r.lane(1));
        FloatVector brod3 = FloatVector.broadcast(SPECIES_128, r.lane(2));
        FloatVector brod4 = FloatVector.broadcast(SPECIES_128, r.lane(3));
        brod1.fma(row1, brod2.fma(row2, brod3.fma(row3, brod4.mul(row4)))).intoArray(es, 4*i);
    }
    return this;
}

takes 200 ns/op, whereas the straight-forward scalar version in JOML takes ~14ns/op…

EDIT2: Got it down to ~22 ns/op with:


private static final VectorShuffle<Float> s0 = VectorShuffle.fromValues(SPECIES_128, 0, 0, 0, 0);
private static final VectorShuffle<Float> s1 = VectorShuffle.fromValues(SPECIES_128, 1, 1, 1, 1);
private static final VectorShuffle<Float> s2 = VectorShuffle.fromValues(SPECIES_128, 2, 2, 2, 2);
private static final VectorShuffle<Float> s3 = VectorShuffle.fromValues(SPECIES_128, 3, 3, 3, 3);
public Matrix4fv mul(Matrix4fv o) {
    FloatVector row1 = FloatVector.fromArray(SPECIES_128, o.es, 0);
    FloatVector row2 = FloatVector.fromArray(SPECIES_128, o.es, 4);
    FloatVector row3 = FloatVector.fromArray(SPECIES_128, o.es, 8);
    FloatVector row4 = FloatVector.fromArray(SPECIES_128, o.es, 12);
    for (int i = 0; i < 4; i++) {
        FloatVector r = FloatVector.fromArray(SPECIES_128, es, 4*i);
        r.rearrange(s0).mul(row1)
         .add(r.rearrange(s1).mul(row2))
         .add(r.rearrange(s2).mul(row3))
         .add(r.rearrange(s3).mul(row4))
         .intoArray(es, 4*i);
    }
    return this;
}

This is sooo fragile and finnicky. So rearrange(VectorShuffle) compiles to alot faster code than broadcast(lane()). Also now inlining s0-s3 will get us to ~100ns/op again.