Vulkan 1.0 Release

Nope, there’s nothing that allocates int arrays, except when creating capabilities objects (but not in Vulkan atm). Also, Why (Most) Sampling Java Profilers Are Terrible.

I was using the memory sampler, which samples the heap. It shouldn’t suffer from those drawbacks. I’ll keep an eye out for what’s happening though. =P

A new build is up (#35). You can now trust the flags in VKCapabilities. Also, now it will load function pointers only for extensions that are enabled. This is a breaking change, creating VkInstance and VkDevice now requires that you pass VkInstanceCreateInfo and VkDeviceCreateInfo respectively to the constructors. You may free the structs immediately after that.

A new build is up (#36). More breaking changes in this build, for an ambitious reason:

The Vulkan validation layers do a pretty good job in general, but they easily crash when you feed them structs with null pointers or buffers with invalid sizes (so do the Vulkan drivers without validation btw). Since this problem applies to other bindings too, structs in LWJGL are now annotated with nullability information. With the new build, you will not be able to set a null buffer or NULL value to a struct member that should always be non-NULL. In addition, method calls that use structs with non-NULL pointers in them, will validate that they indeed contain non-NULL values before invoking the native function. Like other such checks, this validation can be disabled with [icode]org.lwjgl.system.Configuration.CHECKS[/icode] for release builds.

The auto-size support for struct members has also been improved. Setting a buffer member will now always set the corresponding “count” member to [icode]buffer.remaining()[/icode] (or 0 if the buffer is null). Since this opens up opportunities for confusion, bugs and unnecessary writes, the instance setters for “count” members have been removed. This is similar to auto-size parameters in functions, which are removed when the “count” value is taken from the “auto-sized” buffer parameter. In cases where customization is required, the unsafe static setters for “count” members may be used.

edit: This turned out problematic when a “count” member auto-sizes multiple members. The changes have been reverted in build #37 for such members (affects only 6 structs across all bindings) and they will need to be explicitly set for auto-sizing to work.

The above have been applied to all bindings. The Vulkan binding has been updated to version 1.0.4.

Thanks to @KaiHH for brainstorming this with me.

The Khronos group has released 1.0.4. Change log:

  • Bump API patch number from 3 to 4 for the first public update to the spec. Add patch number to the spec title (this will be done automatically from XML, later).
  • Fixes for numerous editorial issues. Regularize descriptions of variable-length array queries. Properly tag enumerants so they come out in the right font (many were mislabeled in usage tags in vk.xml, or not tagged). Spelling and markup corrections (public issue 4).
  • Fix typos and clearly separate description of different types of memory areas (public issue 5).
  • Use standards-compliant preprocessor guard symbols on headers (public issue 7).
  • Note that Github users can’t currently set labels on issues, and recommend a fallback approach (public issue 15).
  • Use latexmath prefix on len= attributes (public issue 29).
  • Make flink:vkCmdUpdateBuffer pname:dataSize limit consistent (public issue 65).
  • Add VK_KHR_mirror_clamp_to_edge extension to core API branch, as an optional feature not introducing new commands or enums (internal issue 104).
  • Cleanup invariance language inherited from the GL specification to not refer to nonexistent (GL-specific) state (internal issue 111).
  • Modify the flink:vkCmdDrawIndexed pname:vertexOffset definition to not be the “base offset within the index buffer” but rather the “value added to the vertex index before indexing into the vertex buffer” (internal issue 118).
  • Fix drawing chapter in the “Programmable Primitive Shading” section where it described categories of drawing commands. It referenced flink:vkCmdDrawIndexed twice. Replace the second reference with flink:vkCmdDrawIndexedIndirect (internal issue 119).
  • Typo fixed in <<sparsememory-examples-advanced,Advanced Sparse Resources>> sparse memory example (internal issue 122).
  • Add flink:VkDisplayPlaneAlphaFlagsKHR to section of VK_KHR_display extension (internal issue 125)
  • Add missing optional=“false,true” to flink:vkGetImageSparseMemoryRequirements pname:pSparseMemoryRequirementCount parameter (internal issue 132)
  • Rename ename:VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT to ename:VK_STRUCTURE_TYPE_DEBUG_REPORT_CALLBACK_CREATE_INFO_EXT (internal issue 133)
  • Fix a handful of broken cross-references in the <<samplers,Samplers>> chapter (internal issue 134).
  • Fix “Input Attachement” GLSL example to use correct syntax (internal issue 135).
  • Update XML schema and documentation to accomodate recently added attributes for validity. Add some introductory material describing design choices and pointing to the public repository to file issues.
  • Put include of validity in the core spec extensions chapter on its own line, so that asciidoc is happy.
  • Fix vertexOffset language to specify that it’s the value added to the vertex index before indexing into the vertex buffer, not the base offset within the index buffer.
  • Fix error in the description of flink:vkCmdNextSubpass.

In other news, coinciding with the 1.0.4 release, AMD put out a new Vulkan driver… updating to Vulkan 1.0.3…

Here are some short and interesting AMD web articles about how and why Vulkan might increase application performance:

The new official non-beta driver from Nvidia now supports Vulkan.

Official support for Vulkan from AMD too.

Someone is trying to make better sense of Vulkan execution dependencies, here (work-in-progress). Corresponding issue in Vulkan-Docs: #132.

This, 1000x. This was the thought I had back in my head when trying to make sense of it. There’s no overview or dependency information in there. All the parts are described, but how they interact is very unclear. Let’s hope this will clear some things up.

But still no sign for the linux driver :clue:

Something I’ve been thinking about concerning LWJGL and Vulkan: I think we need some way of avoiding malloc()ing and free()ing so many tiny structs and Long/PointerBuffers all the time. It may not be producing garbage, but it’s still a lot of overhead going to native code for JEmalloc.

A solution to this would be to allocate a chunk of memory for each thread and use that as a “stack” for Vulkan-needed structs. Instead of going through JEmalloc each time, we’d just reuse the same ByteBuffer over and over again as the backing storage for structs and temporary buffers. For example, a simple vkQueueSubmit() requires allocating a VkSubmitInfo which in turn requires (up to) 4 additional buffers to be allocated depending on what arguments you have. I’m imagining some kind of push-pop stack for these temporary allocations.

How it is now:


LongBuffer waitSemaphores = MemoryUtil.memAllocLong(...);
IntBuffer masks = MemoryUtil.memAllocInt(...);
PointerBuffer buffers = MemoryUtil.memAllocPointer(...);
LongBuffer signalSemaphores = MemoryUtil.memAllocLong(...);

//fill buffers with data...
		
VkSubmitInfo info = VkSubmitInfo.malloc().set(VK_STRUCTURE_TYPE_SUBMIT_INFO, 0, waitSemaphores, masks, buffers, signalSemaphores);

vkQueueSubmit(queue, info, fence);

info.free();
MemoryUtil.free(waitSemaphores);
MemoryUtil.free(masks);
MemoryUtil.free(buffers);
MemoryUtil.free(signalSemaphores);

That’s 10 extra native calls and lots of unnecessary memory-related native calls. It’d be much more convenient if we could do something like this:


VulkanStack stack = VulkanStack.getThreadLocalStack(); //Or integrate this into Vulkan objects like VkQueue, VkDevice, etc to avoid overhead of thread-local stuff?
stack.push(); //Saves current position in stack ByteBuffer

LongBuffer waitSemaphores = stack.allocLong(...);
IntBuffer masks = stack.allocInt(...);
PointerBuffer buffers = stack.allocPointer(...);
LongBuffer signalSemaphores = stack.allocLong(...);

//fill buffers with data...
		
VkSubmitInfo info = stack.allocSubmitInfo().set(VK_STRUCTURE_TYPE_SUBMIT_INFO, 0, waitSemaphores, masks, buffers, signalSemaphores);

vkQueueSubmit(queue, info, fence);

stack.pop(); //Restores current position, "freeing" everything

This would avoid the JEmalloc overhead each frame, and guarantee memory locality of the structs and buffers (although that probably wasn’t a problem in the first place).

Absolutely agreed that proper memory management is now a big concern with Vulkan apps in Java with LWJGL 3.
But LWJGL 3 already provides the user with exact control on how and where memory is being used.
It should be a user’s concern to provide such a memory pool and not LWJGL’s, IMO.
LWJGL 3 should be as lean and utility-classes-free as possible.
So your VulkanStack class is of course a good thing (as would be many many more utility/wrapper/helper classes to effectively work with Vulkan), but those should belong to the client, I think.

Murr murr… I’ll get on it then.

Don’t get me wrong. :slight_smile: It’d be a great thing to have, I agree. And if it is neat and you want to contribute your library or even better, allow others to work on it with you too, you can surely ask @Spasi for a new repository under the LWJGL umbrella.
Something like “lwjgl3-vulkan-util” or something like this.
That’d definitely be helpful.

BufferStack class: http://www.java-gaming.org/?action=pastebin&id=1414
Test program: http://www.java-gaming.org/?action=pastebin&id=1415

[quote]memAllocPointer(): 5.9470854 ms
VkSubmitInfo.malloc(): 5.871784 ms
stack.allocPointer(): 0.2188195 ms
VkSubmitInfo.create(stack.nalloc()): 0.5192677 ms
[/quote]
PointerBuffer of length 4: 27x faster.
VkSubmitInfo: 11.3x faster.

The reason why integrating this into LWJGL would be a good idea would be to directly support structs in it. Currently I have to write

VkSubmitInfo.create(stack.nalloc(VkSubmitInfo.SIZEOF));

which is a bit more verbose than

VkSubmitInfo.malloc();

Optimally I’d like to write something like

VkSubmitInfo.create(stack)

Hmm. Maybe what we really need is a MemoryProvider interface with a malloc(int length) function that BufferStack can implement, and all structs have a generated create(MemoryProvider) function. Then I could actually just write [icode]VkSubmitInfo.create(stack)[/icode].

I was thinking about the configurable allocator, which LWJGL3 already provides.
Just that there currently is no “hook” to specify stack frame boundaries and do a “free everything that has been allocated within this boundary.”

I was thinking about something like this (Pseudo-code):


// Set stackframe pooling allocator
LWJGL.Allocator.set(StackAllocator);

// Then as "around advice" on each method (either programmed manually or via AOP):
// Begin stackframe
{
  StackAllocator.beginFrame();
  // method code:
  Vk*Info info = VkInfo.create(); // <- like usual, but will use the stack allocator internally

  StackAllocator.endFrame();
}
// Here, everything is freed again

Of course, this also raises questions of proper control-flow handling, to close the stackframe on every return (normally or abnormally).

Well, if you do it like that you’ll need to take care of synchronization as well, or use proper thread-local stuff there. I think the lowest overhead would be to simply keep the stack in VkCommandBuffer, VkDevice, etc. Since those objects aren’t threadsafe in the first place, it makes sense that each of them has their own stack as well since you won’t be using them from multiple threads either. So for vkBeginCommandBuffer(), it uses the stack of that command buffer, and for vkQueueSubmit() it’d use the queue’s stack. That avoids a potentially expensive thread-local lookup or synchronization.

I just want to say thanks to theagentd and KaiHH for the feedback and ideas. I’m taking this seriously and testing is already underway. I’ll post more when I have a draft for the new API.

Quick test converting my super-advanced screen clearing to using BufferStack instead:

Malloc:
FPS: 18680

Stack:
FPS: 19180

The functions concerned are:

  • vkAcquireNextImageKHR()
  • vkBeginCommandBuffer()
  • vkQueueSubmit()
  • vkQueuePresentKHR()

New API?! O__O

[quote=“theagentd,post:119,topic:56271”]
Yes: new allocation methods in struct classes, configurable stack allocator implementation, thread-local API for convenience, explicit allocator parameters where it makes sense for top performance.