Vertex cache shenanigans

Hello.

I wrote a small test the other day which was supposed to calculate the size of the vertex cache of the GPU, but I got some very surprising results which indicate that the vertex cache isn’t working as, well, everyone expects. I’ve thrown together a small test program which does some indexed draw calls and uses ARB_pipeline_statistics_query to check the number of resulting vertex shader invocations, and then outputs its findings to a log file. I am EXTREMELY interested in knowing what kind of results people get on other hardware than my GTX 770, especially on AMD cards.

Here’s the entirety of the test source code (only requires LWJGL3): http://www.java-gaming.org/?action=pastebin&id=1475
Here’s a precompiled jar (may not run on Mac): https://drive.google.com/open?id=0B0dJlB1tP0QZbTc5ZExJeENOMWM

Please run the jar (or compile the test yourself) and post the contests of the generated log file in this thread! Although the program prints the GL_RENDERER string the driver returns, it may not show the exact GPU you have, so if possible include that information as well.

Thanks for your attention! The results of this test could heavily impact how meshes should be optimized for vertex caches!

Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
Renderer: GeForce GTX 960/PCIe/SSE2
Calculated vertex cache batch size: 96
Cache size: 32

Cas :slight_smile:

Exact same results as cas.

Renderer: GeForce GTX 1080/PCIe/SSE2

Thanks everyone! I’d really love to have someone test this one AMD since it seems like Nvidia’s 700-series cards and up are all the same. =P

Error: Pipeline statistics are not supported. Aborting.
  Renderer: Mesa DRI Intel(R) Broadwell 

;(

I should mention this is a chromebook

Hmm, that’s weird. According to http://feedback.wildfiregames.com/report/opengl/feature/GL_ARB_pipeline_statistics_query, it should be supported in certain drivers. =/ See if you can update to one of the supported drivers there. I’d be extremely interesting in the result on Intel cards as well.


Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 750 Ti/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 970/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32

weird, I recall I read somewhere it was 24 on nvidia…

It’s definitely 32, but there’s more to it than that. That’s why I made this test. I’ll explain once I have a bit more data. I’m really curious if my findings are the same for Intel and AMD.

[quote=“theagentd,post:10,topic:57630”]

java -jar VertexCacheTest.jar
Error: Pipeline statistics are not supported. Aborting.
  Renderer: Intel(R) HD Graphics 530

I started the gfx driver utility, to figure out the version - then my W10 system BSODed. You’re welcome. :emo:

I am so sorry. Friends don’t let friends buy Intel GPUs.

Nah, it’s a work laptop. I blame my boss. Anyhoo - enough derailing.

Oh, these might help:

Intel HD 530 driver:
Version: 10.18.15.4271
Release date: 2015-08-11

Installing the latest drivers now - let’s see if I can brick this thing.

Using Crimson 16.30.2311-160718a-305077c :

Batch size test invocations: 0 / 3145728
Calculated vertex cache batch size: 2147483647

Cache size 1 invocation test: 0 / 3145728
Cache size 2 invocation test: 0 / 3145728
Cache size 3 invocation test: 0 / 3145728
[...]
Cache size 32765 invocation test: 0 / 3145728
Cache size 32766 invocation test: 0 / 3145728
Cache size 32767 invocation test: 0 / 3145728
Error, failed to detect cache size.

Results:
  Renderer: AMD Radeon HD 7800 Series
  Calculated vertex cache batch size: 2147483647
  Cache size: -1

-ClaasJG

Results:
  Renderer: AMD Radeon HD 5800 Series
  Calculated vertex cache batch size: 2147483647
  Cache size: -1

Yep, same failure on my AMD HD 5870 1GB. (Crimson 16.2.1)

Ohhh!! Thanks a lot for testing. Looks like the AMD driver is smart enough to just not run the vertex shader. I’ll have to expand the test to include a proper shader. Give me a couple of minutes!

Here’s a new version with a proper shader!

Source: http://www.java-gaming.org/?action=pastebin&id=1476
Jar: https://drive.google.com/open?id=0B0dJlB1tP0QZN3gtcTVKalFqdU0

EDIT: There is no need to rerun this benchmark on Nvidia hardware, as it will give the exact same results. =P

Batch size test invocations: 8130 / 3145728
Calculated vertex cache batch size: 387

Cache size 1 invocation test: 8130 / 3145728
Cache size 2 invocation test: 16260 / 3145728
Cache size 3 invocation test: 24390 / 3145728
Cache size 4 invocation test: 32520 / 3145728
Cache size 5 invocation test: 40650 / 3145728
Cache size 6 invocation test: 48780 / 3145728
Cache size 7 invocation test: 56910 / 3145728
Cache size 8 invocation test: 65040 / 3145728
Cache size 9 invocation test: 73170 / 3145728
Cache size 10 invocation test: 81300 / 3145728
Cache size 11 invocation test: 89430 / 3145728
Cache size 12 invocation test: 97560 / 3145728
Cache size 13 invocation test: 105690 / 3145728
Cache size 14 invocation test: 113820 / 3145728
Cache size 15 invocation test: 395752 / 3145728
Cache size 16 invocation test: 699074 / 3145728
Cache size 17 invocation test: 3088412 / 3145728
Cache size 18 invocation test: 3121164 / 3145728
Cache size 19 invocation test: 3100694 / 3145728
Cache size 20 invocation test: 3096600 / 3145728
Cache size 21 invocation test: 3108882 / 3145728
Cache size 22 invocation test: 3137540 / 3145728
Cache size 23 invocation test: 3088412 / 3145728
Cache size 24 invocation test: 3145728 / 3145728

Results:
  Renderer: AMD Radeon HD 7800 Series
  Calculated vertex cache batch size: 387
  Cache size: 23

-ClaasJG

Hmm, the results look inconsistent. Are they identical on each run?

With the new JAR:


Batch size test invocations: 32768 / 3145728
Calculated vertex cache batch size: 96

Cache size 1 invocation test: 32768 / 3145728
Cache size 2 invocation test: 65536 / 3145728
Cache size 3 invocation test: 98304 / 3145728
Cache size 4 invocation test: 131072 / 3145728
Cache size 5 invocation test: 163840 / 3145728
Cache size 6 invocation test: 196608 / 3145728
Cache size 7 invocation test: 229376 / 3145728
Cache size 8 invocation test: 262144 / 3145728
Cache size 9 invocation test: 294912 / 3145728
Cache size 10 invocation test: 327680 / 3145728
Cache size 11 invocation test: 360448 / 3145728
Cache size 12 invocation test: 393216 / 3145728
Cache size 13 invocation test: 425984 / 3145728
Cache size 14 invocation test: 458752 / 3145728
Cache size 15 invocation test: 491520 / 3145728
Cache size 16 invocation test: 524288 / 3145728
Cache size 17 invocation test: 557056 / 3145728
Cache size 18 invocation test: 589824 / 3145728
Cache size 19 invocation test: 622592 / 3145728
Cache size 20 invocation test: 655360 / 3145728
Cache size 21 invocation test: 688128 / 3145728
Cache size 22 invocation test: 720896 / 3145728
Cache size 23 invocation test: 753664 / 3145728
Cache size 24 invocation test: 786432 / 3145728
Cache size 25 invocation test: 819200 / 3145728
Cache size 26 invocation test: 851968 / 3145728
Cache size 27 invocation test: 884736 / 3145728
Cache size 28 invocation test: 917504 / 3145728
Cache size 29 invocation test: 950272 / 3145728
Cache size 30 invocation test: 983040 / 3145728
Cache size 31 invocation test: 1015808 / 3145728
Cache size 32 invocation test: 1048576 / 3145728
Cache size 33 invocation test: 3145728 / 3145728

Results:
  Renderer: GeForce GTX 750 Ti/PCIe/SSE2
  Calculated vertex cache batch size: 96
  Cache size: 32