The transport shader IR (intermediate representation) looks to be well documented. This is a very good thing. Yeah AMD has been pushing software side GPU stuff openly in a pretty cool way (see all the LLVM stuff).
The underdog (AMD) always favors open standards, while the market leader (Nvidia) always prefers push their closed solutions. FreeSynch versus GSynch is another example.
Well PhysX (3.3.3) is now open-source. Repo doesn’t appear to up yet so I haven’t had the chance to see if there are any strings attached.
A Khronos member (a valve employee I believe) has said that whether or not Vulkan works on consoles is dependent on whether MS/Sony support it, but there is nothing stopping it from doing so.
Sadly, AMD have said that they will not release Mantle 1.0 to the public and recommends that gamedevs use Vulkan and D3D12 instead.
Can we expect to see Vulkan incorporated into LWJGL?
[quote=“NegativeZero,post:25,topic:53625”]
Yes, as soon as there’s a spec.
Interesting fact from the API preview: the very first thing Vulkan asks for is a user-provided memory allocator…
That’s a very good thing. Modern default allocators are burdened with a bunch of stuff you really don’t want. Such as memory randomization support.
Yes, of course, it’s just that writing a memory allocator is anything but trivial and we have no solution out of the box. A Java implementation (using NIO/unsafe) would also have to go through JNI a few times on each allocation. One or more native (and production-quality) implementations bundled with LWJGL would be better, but the question is which ones?
Use libc’s and Kernel32’s heap allocation functions through the C-standard malloc/free interface.
I think Vulkan is asking for an allocator because it does not assume the platform to have a standard C library.
But all platforms Java (and LWJGL) runs on has one.
We should use it then.
Of course, if one feels the need for a tuned allocator, LWJGL could provide an interface for that.
dlmalloc is an easy first pass.
EDIT: wait. What’s the contract? single threaded?
Found a summary of a talk on reddit. I’ll dump the points here:
- 15:50: Vulcan is represented by an “instance”. No global variables or the like. Represents the loader, which is what finds the drivers installed on the system, finds what GPUs are available, making it look like a single unified system.
- 16:30: When created, you give it information about what you want to do with it, including memory allocator callbacks. You can ask it about what GPUs are present, enumerate them and you decide how to use them. Working on multi-GPU support, but it’s up to the application if it wants to be.
- 17:45: Can query info on GPUs. Memory, relative performance, what kind of features it has. API for sharing resources between GPUs.
- 18:40: “Device” is a logical representation of a GPU. It’s how you talk to it. Each has a number of queues, you control which ones you want plus what extensions. You opt-in to extensions. Plus, what layers should be enabled (layers used for things like validation, debugging, logging, etc.)
- 19:45: Queues are things like compute queue, graphics queue. GPUs will have various numbers of each kind of queue. All queues scheduled independently and run async.
- 20:40: Batch up commands in command buffers. Each is thread local. You tell Vulkan what kind of queue each buffer will be sent to.
- 21:15: Any heavy work gets done when inserting commands into command buffers, and thus can be threaded.
- 21:55: Shaders are compiled up-front. Only language is SPIR-V. Offline GLSL -> SPIR-V available.
- 22:30: Pipelines: shaders and fixed function information attached. You can opt in to some mutable state. Can be serialised. If a serialised pipeline is re-loaded, the driver does no additional work.
- 23:20: Mutable state: blend state, viewport state, depth stencil state, etc. These are bundled into smaller mutable objects that then get re-bound to the pipeline.
- 23:40: Resources: textures, samples, etc. CPU and GPU components. CPU side is the API handle and are immutable from layout perspective, with mutable contents. Application’s responsibility to allocate and manage GPU memory. API tells you how many GPU allocation units it needs for a resource, from what pools. Lets you do your own pooling, aliasing.
- 25:15: Descriptors: GPU side, resources are represented by descriptor sets. Each set has a layout (X textures, Y samplers, etc.). You can switch out descriptor sets and pipelines without cross-validation.
- 26:25: Render passes. Knows what to do when it begins and ends. Basically to support tile GPUs, useful for cache management.
- 27:00: Drawing: done inside render passes. Commands going into command buffer: begin, bind pipelines, descriptor sets, call a draw, end pass. These are where all the state is.
- 27:50: Synchronization: event objects, command buffers can signal.
- 28:10: Resource State: sync within command buffer. Barriers. So things like waiting for render-to-texture to finish before reading. Validation layer can help you get this right.
- 29:20: Work Enqueue: submit command buffers. Your job to ensure resources are available. Semaphores to track ownership of resources.
- 30:10: Presentation: Actually extension. Two kinds: composited presentation where you ask the compositor to present for you, and where you tell the compositor to present. Likely to result in two APIs. Should cover the gamut from mobile -> PC. Also includes fullscreen, vsync. Present through the work queue.
- 31:30: Teardown. You’re responsible for cleaning up resources. Means you can trigger use-after-free or leak memory.
- 32:25: AZDO: Less near, already very close to zero overhead. However: build command buffer once, use many times; doesn’t have “bindless textures”, on the basis that descriptor sets can be arbitrarily large and that you have manual management of residency anyway; also lets you do sparse; you can do instanced drawing.
- 33:50: Conclusion: Vulkan not really “low level”, just a better abstraction. Very low overhead. Extensible. SPIR-V completely independent, hoping to see new high-level languages for it.
Big part of the idea here is to make command buffers composable completely in user space, to minimize amount of times boundary into deep driver/kernel has to be made. I wonder if it will be possible to somehow reuse similar idea to avoid jumping between java and native code. In perfect world, command buffer would be fully prepared on java side in some kind of native memory buffer and then converted into real Vulcan command buffer with just a single JNI call.
Does anybody know if command buffer internal implementation will be accessible, so it can be composed directly, without using vk calls for each command?
Vague, unfounded, yet correct* statement: JNI calls are so cheap that it simply doesn’t matter whether command buffer internals are exposed.
* YMMV
I think that might be the case with OpenGL driver calls too - they are “so cheap” that they don’t add up to much (and they’re in user space too unlike DX)… until you start to do 100,000 of them. Problem that JNI calls have is that they can’t be inlined and thus have to go through the whole push stack & PC, call, return & restore bollocks that inlining basically eliminates.
Cas
Whatever is behind the JNI is very likely to dwarf the overhead. So even if you do 100,000 JNI calls, the loss in performance won’t be due to the JNI overhead, but whatever else is executed 100,000 times (both Java side and C side).
Memory copies/walks are expensive. That’s where the performance is gained/lost in this use case.
On an offtopic side note: push/call/restore isn’t that slow. Inlining is mostly faster due to all the other optimisations it enables.
[quote=“abies,post:33,topic:53625”]
Yes, “draw calls” in Vulkan will be data inside a command buffer, not actual function calls. There is no public information yet on how the application actually builds the command buffer. It could either be writing the data itself in a plain memory buffer, or calling functions that do it for you. From previews I’ve seen DX12 does the latter, but my guess is Vulkan will do the former. In either case, you’ll be able to build and submit command buffers from multiple threads, which is perfect for what Java does best (concurrency) and you’ll never have to worry about JNI overhead again.
It’s also quite possible that by the time there’s enough market share to deploy a Vulkan application, Java will have built-in FFI support, which means close-to-zero native call overhead.
That was incredibly informative! Thank you for the link Spasi.
The strange thing is almost all of the stupid hoops we go through right now to get performance are basically… building command buffers of objects which when executed in the main thread will run OpenGL commands :cranky:
Cas