I have some JOGL code that is looking a bit slow… is there a way to detect if the code is being executed by the GPU?
Alternately, I am doing lots of calls to glVertex3f() - could this be a reason for the slowdown?
I have some JOGL code that is looking a bit slow… is there a way to detect if the code is being executed by the GPU?
Alternately, I am doing lots of calls to glVertex3f() - could this be a reason for the slowdown?
Immediate mode is “slow” compared to other ways you could render with OpenGL. However, it isn’t by any means slow to process to a couple hundred sprites (as long you have a decent graphics card). How many times are you calling glVertex3f?
Thanks.
In the display method I call glVertex about 350 times. I am drawing a grid of translucent quads in a loop and rotating the view. I’m getting 80 FPS with a Radeon 3450 card.
But I was getting jerky animation with just one quad. The performance smoothes out if I force the animator to use the screen refresh rate oddly enough. Perhaps something is choking on the graphics card that doesn’t happen if i force a slower frame rate. Basically I know nuffing.
I didn’t know about ‘immediate mode’. I’ve just been hacking the sample on wikipedia as my starting point. Can you suggest a mode that would be smoother?
Hi
Use GLProfile.isHardwareRasterizer() (only in JOGL 2.0) to know whether your OpenGL implementation is really hardware accelerated.
ra4king is right, immediate mode is slow but the difference with retained mode (vertex arrays, VBOs, VAOs) is “small” when you draw only a very few things. The example I put into Wikipedia uses immediate mode as far as I know.
Are you sure v-sync is disabled? Maybe there is nothing wrong in your case.
You should look at the demos and the examples, some of them use VBOs:
Thanks for the sample on Wikipedia - it has been very helpful. I have now found out what retained mode rendering is and am converting my code to use buffers. I’m currently getting a jerky 79FPS, so I will report back how much better the buffered implementation runs.
Shall I update wikipedia with a buffered version when I understand what I am doing? Although the the article clearly states that the rendering method is only for demo purposes, it would be better if it just demonstrated the right way.
I’m trying to see how scenes and effects influence the frame rate, so I have vsynch switched off intentionally, but when I switch it on I do get a smoother ride.
My example in Wikipedia is not wrong, it is just a very simple one. If people need better examples, they should look at ours.
Actually, when v-sync is on, the frame rate should be close to 60. If you disable it, the frame rate won’t be capped. If you get 79 frames per second with v-sync off, it is a bit worrying.
Getting 79 FPS WITH v-sync is even more worrying in my opinion. ^_^’
Sorry, I didn’t mean to imply the wikipedia example was wrong, just that it would be better and more informative if it showed a scalable way of doing things. The sample was very helpful and certainly got me started. But quick googling of JOGL samples almost always shows dozens of ‘basic’ samples showing immediate mode, whereas it looks like the retained mode versions would only be a few more lines and ultimately so much more useful.
When I set vsynch I do get 60FPS so it is behaving as one would expect. With one quad, rendering is jerky at 270FPS (without vsynch). With 225 quads the rate drops to 79FPS and still jerky. (my statement about 350 vertices previously was obviously a little inaccurate!)
I had my old CRT monitor set to 80Hz for ages. V-sync doesn’t automatically mean 60Hz.
Absolutely, I can’t even look at CRTs at 60Hz, I get a instant headache.
I used at least 75Hz, 100 if possible.
Not hoping that many people still use CRTs though…
I see what you mean but it’s true for OpenGL examples in general and that’s why using the retained mode in all basic examples would be confusing, some people would conclude that JOGL is more complicated than plain OpenGL in C.
I succeeded in obtaining more than 500 FPS with hundreds of thousands quads with TUER alpha version on a similar graphics card, maybe something is wrong in your driver.
I have changed to using retained mode and actually the performance seems to have slowed - the retained mode version is 20FPS slower. I will have a look at some more samples and see if I can get a new driver for my Radeon.
I am getting a GT430 card next week so that is another route to test this.
Anyway my display method does this:
gl.glEnableClientState(GL2.GL_VERTEX_ARRAY);
gl.glEnableClientState(GL2.GL_COLOR_ARRAY);
gl.glVertexPointer(3, GL.GL_FLOAT, 0, vertexBuffer);
gl.glColorPointer(4, GL.GL_FLOAT, 0, colorBuffer);
gl.glDrawArrays(GL2.GL_QUADS, 0, 2500);
gl.glDisableClientState(GL2.GL_VERTEX_ARRAY);
gl.glDisableClientState(GL2.GL_COLOR_ARRAY);
I know that, but how many CRT monitors refresh at 79Hz? And it should be a safe assumption that most people use a monitor that doesn’t flicker your eyes to oblivion nowadays, especially if they have used computers enough to do programming on them.
glDrawArrays take how many vertices you have, are you sure you have 2500 of them (= 625 quads)?
“glDrawArrays take how many vertices you have, are you sure you have 2500 of them (= 625 quads)?”
You are right - I was overestimating the quad count by 30%. Adding a counter to my init method to actually verify the number of verteces I was generating was the obvious solution. This has brought the performance back to the level of the immediate mode version. I will scale up my quad count to see if i can start seeing a boost from retained mode, and I will also try replacing my quads with pairs of triangles - perhaps my ageing graphics card would work better with old-fashioned triangles. And next week I will have a second graphics card to test on.
BTW, my monitor is definitely set to 60hz.
I have worked on my code some more and made the grid of quads scalable. I find my Radeon 3450 drops to 1FPS when I get to 6,860,000 verteces (a filled cube of 70 x 70 x70 units each made of five translucent quads). Java throws IllegalAccess errors if I try to draw a bigger cube. It is a bit surprising - I would have thought the 256MB graphics card could store larger arrays than that.
A few little questions - I allocate an array on the card for each scene which I then run. Obviously allocating arrays on the card will consume resources, how to I deallocate the arrays I no longer need?
What is faster - a display list or a vertex array?
6.8 million vertices… :o
A VA is faster, especially if you turn it into a VBO, which uses almost exactly the same API. You deallocate a VBO with glDeleteBuffers (found in GL15), then freeing the client resources (which in Java you do by simply not referencing it anymore).
OK, I’ll look at VBOs, thanks for the pointer.
I have an Nvidia GTX 460M and I can render around 7 million vertices forming 3.5 million triangles (I’m rendering quads, 4 vertices and 2 triangles per quad) at 60 FPS. I’m using static VBOs+IBOs.
My card is very old (and is a cut-down laptop version too), and my code is very naive, but i do not get that performance. I am also deliberately adding features to make the card work harder - verteces are not reused, everything is translucent and so on. I converted my code from VAs to VBOs tonight and compared the results between the two methods.
7500 74 75
60000 35 36
202500 23 22
480000 16 17
937500 13 13
1620000 10 11
3840000 7 8
7500000 5 5
12960000 3 x
The first column is the number of verteces, the second is the VA rendering and the third column is the VBO rendering. The VBO performance might be slightly faster but it doesn’t feel like a my programming efforts have been rewarded. My theory is that the processing on the card is so slow the benefit of putting the data on the card is masked by the vast processing time.
But if you get 60FPS on a GT460, 5 FPS on a Mobility 3450 actually sounds quite reasonable.