Well, it’s fast enough for the piddly things we want to do, but I can’t see myself writing anything utterly mind blowing any time soon
Cas
Well, it’s fast enough for the piddly things we want to do, but I can’t see myself writing anything utterly mind blowing any time soon
Cas
I went to the Java 7 release meeting yesterday. (Did they show the audience in the live stream? I was the fellow with the plate of food on my lap in the first side row. :D)
The new File system is touted to be an order of magnitude faster, if I read the charts correctly.
There are some cool things like fork/join (to send/split a task into two or more threads presumably on their own cpus) that also could lead to significant pickups in performance.
So we could be seeing some improvements, not that what we have is at all bad.
The estimation that Java is 50% slower than C++ seems overly pessimistic to me.
The biggest difference between C++/Java and even C and Java is the programmer. The Langs are close enough for the most part that its the quality and experience of the coder that makes the most difference. At whcih point one just needs to ask which lang lets the coder work on the important part of coding rather than tedious details. IMO all 3 languages suffer from forcing to work too much on tedium. But java is better if only because it has a half decent standard library.
Seriously after 15+ years of internet and C++/C doesn’t have sockets as as part of its standard library? What are they thinking?
Of course there are exceptions to all this and sometimes java is not the right tool. But may times neither is C/C++.
Yes i put C/C++ together because there are lot of coders who just won’t code C++ and you are left with C looking C++. At least in commercial circles.
I disagree with most of the above. First an interesting link:
The Ubiquitous SSE vector class: Debunking a common myth
The answer to the question “Is Java as fast as or faster than C++” is: it depends. On so many factors actually that there’s no point trying to do any general comparisons, outside the context of a real-world application.
Memory access is so much more expensive than most calculations that happen in between. This is a fact that applies to 99% of the code in 99% of the programs ever written, it’s true for both CPUs and GPUs and will be true until a radically different micro-processor architecture is invented. There’s only one way to get around this bottleneck and that’s by designing your data structures and memory access patterns in a way that gives compilers and, more importantly, CPUs themselves (x86 CPU pipelines do this independently of whatever language you’ve written your program in) an easy way to identify what you’re trying to do. The first obvious benefit is that you properly utilize the memory caches and the second is the memory prefetching that can happen, when the access pattern is clear enough to be predictable. As a side note, console CPUs don’t have the necessary circuitry to detect access patterns, that’s why programmers need to use compiler hints to make it happen (another reason we haven’t seen Java on consoles so far).
This basically boils down to linear data structures and simple loops. That’s why JVM engineers keep asking people to write the simplest code possible. Simple == optimizable and this has nothing to do with Java. None of the above is any more special in Java than in C++. It just so happens that software people writing in C++ usually have the experience to design their code with all that in mind, whereas Java programmers tend to ignore it. Data-orientation is much more important than object-orientation and not only for performance. It’s much easier to parallelize a simple loop on a simple data structure (e.g. with fork/join), than a calculation happening on a complex object graph.
Bounds checking is also dirty cheap on x86. For the same reason actually, the array memory access is orders more expensive than the conditional jump before it, which the CPU has already predicted will not happen for the next X iterations (that is, it’s almost free, simple noise compared to the rest of the code). Some Java programmers think “crap, it’s bounds checking again, I need to fix this” and they go ahead and redesign their code. Which is good! Bounds checks occur on every iteration when the JVM cannot understand your code… which is basically when you’ve written cache unfriendly code. “Fixing” the bounds checks doesn’t yield a performance increase because you’ve removed the conditional jump, it’s because you’ve started utilizing the precious cache properly.
Finally, because always “it depends”, when you’ve tried everything and you’re sure you have a bottleneck than can’t be fixed any other way, there’s a thing called OpenCL available. It’s here and it works (heck, it’s already in browsers too). There’s no reason to go down to JNI and assembly hacks to get top of the line performance, when it’s so easy to use OpenCL through Java. All you need is LWJGL and Riven’s magic mapped objects.
Btw, AAA titles means amazing graphics (= GPU, the expensive engines do their best to minimize CPU usage - few draw calls etc) + physics (= GPU, else you get the simple version of it) + absolutely brilliant pre-baking of game data (= offline work, performance doesn’t matter). Most games use scripting for their game code too. Again, nothing to do with language or performance, but it has to do with the platform. Java isn’t an option because it cannot run on consoles, not because of the few disadvantages it has compared to C++.
Brilliantly worded, Spasi. That’s the article should have said.
Some resources for people that want to know more about data-oriented programming:
Pitfalls of Object Oriented Programming (pdf)
Data-Oriented Design (Or Why You Might Be Shooting Yourself in The Foot With OOP)
Data-Oriented Design Now And In The Future
Data-Oriented Behavior Tree (heavy read, especially the 3rd part, but really useful insight on the process of going from OO to DO)
These should get you started, google is your friend after that. It’s kind of a hot topic for game devs (and, not only) lately and there’s plenty of info available. I’ve recently implemented a benchmark to test the theory behind the first article, mainly to see the effects it would have to the way code is written. The result was much much better code clarity; in the OOP case in order to have a complete understanding of what’s happening you need to navigate the object graph, jump from method to method etc, it’s really taxing on your short-term memory even with good IDE support. In the DOP case, the code is right in front of you, as a series of simple steps. As for performance, close to 3x faster for the Java implementation, 40x (!) for the OpenCL implementation (including the GPU readback).
Note that neither me nor the people that wrote these articles suggest that OOP is inherently bad. It’s just that data should always come first. Design for data, not code. Have the code go to the data, not the other way around. It’s simple and applies equally well to both OOP and procedural programming.
This is really, really interesting - thank you for posting it.
A question - suppose you’re writing a particle system in a data-oriented way. So among other things, you’re storing lots of x,y coordinates that you then plan to iterate over.
There are quite a few ways to represent that data:
float [] coords; // x, y -> coords[i], coords[i+1]
float [][]coords;
Vector2f [] coords;
LinkedList<Vector2f> coords;
ArrayList<Vector2f> coords;
It seems pretty clear that the 1-dimensional array is the best way to go, but are the other choices still beneficial in terms of utilizing the cache well and allowing prefetching to happen (incidentally, what is trying to figure out the memory access pattern to prefetch - the jvm when it’s compiling into native code, or something much lower-level?)
It seems like having an array or even list of objects would still be ok as long as they are allocated at about the same time, and thus probably contiguous in memory (can you count on the JVM to do that?). I guess that would break down when particles expire, and new ones get allocated elsewhere, while in the array case it’ll stay contiguous. But, you could use pooling for the objects.
Am I thinking of this the right way? Is it mostly about keeping data in a contiguous chunk of memory, so that when a page is loaded into the cache, you get as much useful data from it as possible?
I wonder what they did in JDK 7 that made my sprite engine so much faster - word has it they removed rather a lot of bounds checking though. So the noise argument may be true to some extent with the very latest chips and JDKs but everywhere else … when you do something 100,000 times a frame and it’s twice as fast in C++ … there we go.
Cas
The DOP way of doing things is so right though (if I recall correctly the PS3 absolutely forces you to think in terms of shoving data through it in this way). Java doesn’t really make it particularly easy though I hope that Riven’s hackery will make it somewhat more viable.
Cas
[quote=“avm1979,post:27,topic:36933”]
For such a simple example, float[] would be best, followed by float[][]. Note that the 2D array should be float[2][count] and not float[count][2]. For anything more complex, I’d use Riven’s mapped objects for good performance and nice code. Without it you’d have naked data in arrays and even though performance would still be great, code would be a mess (and very hard to refactor). Especially if you want to interact with GL/CL and you’re using NIO buffers instead of arrays.
There are x86 instructions for prefetching in different instruction sets (e.g. SSE has some), but I don’t know if the JVM uses them. In any case, x86 CPUs can detect patterns in the instruction stream and do prefetching automatically. I guess this has lots of limitations and prerequisites, which supports the argument for simple loops even more.
Using objects won’t work because of what I’m describing here, 3rd paragraph. With the way GC works, there’s no guarantee your objects will remain contiguous in memory, even when you’re not de-allocating or re-allocating anything. I can’t find the source now, but I read recently that JVM engineers have acknowledged this and they’re investigating ways to improve it. But honestly, I don’t think you’ll ever be able to get the exact behavior required, whereas with explicit allocations your data will be contiguous by definition.
An option for more OO code could be to use handles. Instead of references to objects you use references to handles. A handle contains the index where you can find the object data, meaning you have an extra indirection in exchange for potentially cleaner code. This is a good description, a bit C++ specific though.
[quote=“avm1979,post:27,topic:36933”]
Yes. Whenever you access a memory address, you want whatever sits next to it to be what you’ll be accessing on the next iterations/instructions.
Thank you for explaining. As before, extremely informative
The bit about handles makes sense - sounds like mapped objects are essentially a nice way to do that with what looks like regular java code.
Some of this Data orientated programing still feels like premature optimization to me. Packing things into floats? Perhaps –after profiling.
Also you don’t need the JVM to layout objects in memory perfectly. Just pretty well. It may not do that. But from some of my tests it must be mostly doing it. I get too close to optimal performance to be cache thrashing that much.
Also you can see cache trashing in action by iterating though a large array randomly and compare with linear access. IIRC i get about a 5x speed difference.
[quote=“delt0r,post:32,topic:36933”]
Packing is mostly useful when you want easy interaction with GL/CL. You don’t have a choice there, if you want to do something on the GPU you need to pack your data. If you want your code to look nice and be easy to refactor, you either work on POJOs and do a copy or pack/unpack step (paying the performance and memory penalty), or you use mapped objects.
For normal stuff, DOP is still useful without packing data. A loop on a continuously allocated Vector3f[]? Of course, that will be very fast in Java. Assuming the objects have matured altogether, 4 byte overhead per object, 2 MB cache per CPU core, 128 byte cache lines, you only “waste” 1 cache line every 32 objects compared to doing the same with packed data. Doing the same on an array of objects that look like this?
class Unit {
Vector3f localPos;
Matrix4f globalTransform;
Vector3f globalPos;
}
for ( int i = 0; i < units.length; i++ ) {
Vector3f pos = units[i].localPos;
// do something with pos
}
That’s when you have problems.
I used to try to program games in C++. I scorned Java because it was an interpreted language. I viewed it as a language for sissies.
There was a problem with the games I programmed in C++: by and large, I couldn’t get them to work. I programmed some garbage for DOS in high school, but DOS was already becoming obsolete by that point. When I switched to Windows programming, I was wholly baffled. I kept trying to make a game, when I really needed to figure out the basics of Windows programming first.
Then I was forced to learn Java in a programming class. I tried to program a game in Java because I was fed up with C++. It sort of worked.
So, in short, I use Java because I can make games that actually work in Java.
Knowing what I know now, I could probably buy some books about C++ game programming and figure out how to write games in C++ instead. But how would that help me? I don’t have performance problems or know of anything I could do with C++ that I can’t do with Java. I don’t see the point in using C++ to do what I’m doing, though it would probably be instructive to do more exploration with using other programming languages.
Edit: I mean that using other languages would probably help enhance my programming skills in general, not that I expect to find out that some other language is superior to Java.
A perfect example of how you can apply DOP to completely destroy one of the best data structures in the JDK, in both throughput and latency:
Disruptor: High performance alternative to bounded queues
Note how the resulting API and implementation is 100% object oriented. DOP has nothing to do with how you write code, it’s only about applying an optimal data structure and memory access design that produces the best CPU/cache utilization.
Unrelated to my point but very interesting nonetheless, in paragraph 4.1 they basically make a case for Riven’s mapped objects.
Amazing how long ago I whinged to Sun about mapped objects to address precisely this sort of crap and after all these years it’s just taken our resident genius a few days to just basically hack it all together perfectly on his own in more or less precisely the way I envisaged it working. Unbelievable how slow Sun were at addressing this issue. Another awesome strike for LWJGL as well - Riven you are truly one of a kind.
And yet… STILL NO BASTARD iOS PORT. GRR. Riiiiiiivvvvvvveeeeeennnnn!!!..
Cas