But the “Hey I wrote a faster hashmap!” thread is a perrenial on performance conferences. Everytime I’ve looked at such a thread one of thee things develops:
(1) {most common} The implementation turns out to be buggy. By the time they fix the bugs its no longer faster.
(2) {less common} The implementation is great for small numerb fo entries but doesnt scale as well as HashMapand thus fall beyond for larger numbers of entires-- usually pretty awfully.
(3) {least common} It really is faster, but has some serious inherent other limitation.
The guys who wrote the collections code in the JDK are some of the best general computer scientists I know, actually, and HashMap alogirthyms is a highly researched and understood area. Based on that, they chose an algorthym that was reasonably fast and scaled well as the eventual uses of such a general data structure are unknown.
[quote]…they chose an algorthym that was reasonably fast and scaled well as the eventual uses of such a general data structure are unknown.
[/quote]
And that is basically the key to your optimization strategy. The JRE collection classes are general purpose. If you can narrow the scope of usage then there are likely alternatives that may work better. A recent thread here about Radix sort is one example…
[quote]Yep… Thats what we call “Algorythmic Tuning” in the book and can be by far your biggest wins.
[/quote]
Where do you think I get this stuff? I bought your book before it was made available online for free…
Thanks but I’m sure you knew the basics of tuning before reading our book.
Adn blame Steve, it was HIS idea to put it all up as HTML after the first year. I’m a money grubbing bastard and would still be trying to get every nickel
quote
The guys who wrote the collections code in the JDK are some of the best general computer scientists I know, actually, and HashMap alogirthyms is a highly researched and understood area. Based on that, they chose an algorthym that was reasonably fast and scaled well as the eventual uses of such a general data structure are unknown.
[/quote]
Probably there are historical reasons why many programmers and Java newcomers in particular think some Java classes could be faster than they are. I remember the very first Java versions years ago: Java has been slow in these times. I’m not blaming SUN for this: a desktop JVM has been a relatively new thing, so naturally its first incarnations have been more basic than they are today. (I think only one similar thing has been there, before: the british experiment named TAOS - pretty smart, too.)
I’m happy to see that modern JVMs and Java’s large library classes are amazingly fast - so you people at SUN do a very good job.
Some time ago I’ve been involved in a C++ project where we used the unfortunate STL which comes with the unfortunate MS Visual C++ v6.x. I think MS’ STL even missed a HashMap. Anyway we replaced the MS STL with the platform neutral STLport and for (normal) Maps and Sets this boosted performance by the factor of 2+ in any case we used it. I could well imagine a lot of people are suspicious about “built in library collections” because of similar experiences. Even if this is inequitable in a case like Java today.
Actually VMs themselves are quite old, going back as far as the UCSD P-Machine and IBM’s VM operating system. But if you mean run-time compilers then yes they are fairly recent. They were used in high performance small talk systems before Java. (In fact Sun’s original Hotspot team was a small but brilliant Smalltalk maker that Sun bought and re-purposed.)
What people really need to realize is that performance can be bad for many reasons. In the beginning, interpretation and simple JITs were an issue. Once we started getting a handle on that then other things, like architectural problems in Swing surfaced. Heck, Swing’s firs release was slow because it just happened to be ontop of the brand new 2D library and THAT was slow because we had relied on microbenchmarks and fooled ourselves about its speed. (The primary thing thst made 1.3 so much better was that we put together a much better benchmark suite to guide us.)
But things like Java2D and Swing are very complex and there are many ways to go wrong. Something as pure and simple as a HashMap is pretty hard to screw up if you know the datastructures and issues thereof.
I know a fair bit about TAOS actually. I’m not as a big a fan as apparently you are. I guess I should leave it at that.
Thank you. The IBM JVM developers are very fine too, as is the Mac team (though the Mac one is partly based on our code).
I take pride though that we can reasonably compete with IBM witha much smaller VM team then they have
Been there. Part of the problem is that MSFT created a non-ANSI C++ that is missing some of whats needed to support normal STL. As a result they had to hack it up.
Really? How? When we tried we found that MSFT’s C++ didn’t support some unusual “new” syntax that STL used. (As I recall, and its been awhile, it was a way to use new that let you supply the memory…)
I don’t doubt that the collections folks at Sun know their algorithms, but isn’t the problem with the collections the internal allocations of objects that are necessary because the collections are so general?
Shawn Kendall posted a thread about that his team had finally found ArrayList to perform well enough to be used in real-time games, and I guess it is no accident that ArrayList doesn’t do any per-object allocations/deallocations like HashMap or LinkedList. Internal object creation also keeps the generics proposal from being truly useful to us, as we won’t be able to use the auto-boxed primitive collections (even ArrayLists) in performance sensitive applications.
The Jade HashMap seems to solve the problem for HashMap by using an object pool (conceptually quite similar to how ArrayList is able to avoid lots of garbage), but I’m thinking that the best solution would be to avoid the extra object allocation altogether. I’ve done this in some applications for LinkedLists by creating an interface that specifies the ListEntry functionality, allowing me to implement the interface in the base class of the objects that I want to store in the list. The downside of this is that I have to implement the whole list class myself too.
This is a very application-specific solution and it is bad from a robustness POV to have to implement the ListEntry funcationlity in all the objects to be stored in the list (too easy to forget something somewhere). I guess this is an area where C++'s multiple implementation-inheritance could result in a better solution as it would allow the collection library author to create one class with the ListEntry functionality and have the user of the collection make all the objects he wants to store in a list subclass that class. However, that still wouldn’t work for primitive types.
Are there any other optimizations, either in the libraries or the VM/GC, on the horizon that could allow collections as general as Sun’s to avoid, or at at least diminish the negative consequences of, the internal allocations?
Linked list interface enbedded in your storable objects is definitely the best solution performance-wise in any language.
It just makes application level programming a pain since you have to implement the damn thing right when you don’t want to
In a “complete” system/engine, this is done for the containable objects and then system is used as a whole, so that is one of the things that defines a “layer” - IMHO
[quote]The guys who wrote the collections code in the JDK are some of the best general computer scientists I know, actually, and HashMap alogirthyms is a highly researched and understood area. Based on that, they chose an algorthym that was reasonably fast and scaled well as the eventual uses of such a general data structure are unknown.
[/quote]
We’ve been working on routing algorithms lately, and it’s caused us to take a closer look at some of the java.util data structures. We hold about 100 millions links and nodes in memory. It was interesting to see that Josh Bloch used the same algorithms book that I have from school (Introduction To Algorithms) to create the red-black tree implementation for TreeMap. So, yes, the authors of the Collections classes are using well-proven theory.
On the other hand, if you take a look at the source for HashMap, you can see how things could be more efficient. For example, they re-calculate a hash every single time you perform a get or a put, and the justification used is that the user may have picked a poor hashcode. That’s probably smart to guard against the unwashed masses, but it sucks when you’re trying to squeeze out every last microsecond of performance.
What’s worse is how Sun can’t be bothered to support data structures that work efficiently with primitive types. JSR14 is just a sad disappointment in that respect.
[quote]Really? How? When we tried we found that MSFT’s C++ didn’t support some unusual “new” syntax that STL used. (As I recall, and its been awhile, it was a way to use new that let you supply the memory…)
[/quote]
It’s called “placement new”, and, yes, you supply the address to which the object is “placed”. You must have been using a very old version of VC++, because, AFAIK, VC++ 6.0 has always supported placement new, even if it sucked so, so much in so, so many different ways.
[quote]I don’t doubt that the collections folks at Sun know their algorithms, but isn’t the problem with the collections the internal allocations of objects that are necessary because the collections are so general?
[/quote]
More to the point, they allocate and deallocate because they were written with your average Java application in mind.
Game clients can have unusually stringent requirements in this area.
[quote
Shawn Kendall posted a thread about that his team had finally found ArrayList to perform well enough to be used in real-time games, and I guess it is no accident that ArrayList doesn’t do any per-object allocations/deallocations like HashMap or LinkedList.
[/quote]
Oh boy, thats a MONSTEROUS leap in logic with no data.
If you really did measurments to see what the impact of these was then it might be supportable, but as a bald faced assertion its nothing more then guesswork and assertion.
In fact, Java allocation and deallocatio nare EXTREMELY fast, much faster then C/C++ as has been shown by numerous benchmarks. The real issue is not allocation/deallocation at all, its garbage collectiotn.
If you re-read Shawn’s post yo uwill see that GC is very specifically what he is talking about.
How your garbage collector responds depends totally on its implementation. Array list doesnt do a lot of smal lallocations, but it DOES do large allocations as it grows AND does array copies. Its pretty much a VM specific which is going to cost you more, if either does.
Again, more to the point, blindly relying on autoboxing wilol DEFINITELY hurt you because there IS a translation step at each boxing/unboxing. There is no code differencve between autoboxing and manually wrapping pirmatives, its juts a bit cleaner syntactically.
The overhead of pooling is at least as much as the overhead of the Hotspot newspace. If those objects are short lived, then its probably a net loss. If they are long lived, then gc is much less an issue.
Yup building the list header into your data structrues can be a useful tool. This is one of the few places, actually, where I miss multiple inheritance in Java.
Again your focused on the wrong place. Allocation is shit-arse fast in Java. The issues are GC, and as Shawn pointed out, the GC continues to improve. We can reasonably hope at some point in the near future it will be a non-issue.
I know it’s only your opinion, but it worries me that this perspective could be common within Sun: if there is no intention to make boxing anything other than syntactic sugar then I’m going to join the “please please please remove this stupid idea from java” camp (I don’t think there is one yet, but if people start to realise this is where it’s going I’m sure one will form pretty rapidly…)
If I want performance-gobbling syntax changes that do NOTHING other than save my fingers some typing, I’ll retrain as a VB programmer. IIRC it’s one of the defining features - it takes very little code to do anything, but you typically sacrifice a lot of the performance you would have had if you’d written a few more lines of code in a proper language. You write VB programs when you’ll be sorting lists of 10 items on a 1 Ghz PC, so that the huge performance drops don’t matter.
I’ve only stood by boxing (and I suspect this is true for many people?) because I have faith that what Sun introduces as a transparent code-rewrite now will be turned into a low-level trick in a future version which doesn’t sacrifice performance. This kind of faith mainly rests upon the power of a JVM - I write the code now, you update the JVM + core libs in 2 years time, and my code benefits without me having to be around to re-link or re-compile.
Ditto generics - part of it’s value is that the official inclusion by Sun opens the way for clever high-performance versions in future standard Java distributions…although generics obviously also has the much greater value of bringing compile-time checks to the biggest area where they were historically absent (and causing no end of problems :()
Well, inside bytecode, boxing is actually turned into real new Integer(5) opcodes, so jvm will have to be very smart to guess what you really want. Additionally, I expect boxing to be mostly used for collections - and I do not see a reasonable way to avoid object allocation (Smalltalk small/bigint probably will not ever get into jvm, as it costs speed for every operation).
Boxing/unboxing IS a syntactic sugar. I don’t think that anybody ever even considered it to be optimized in any way - at least in no way more than normal object allocation (escape analysis/on-stack alloc etc).
Well, yes, exactly. I was talking about people hoping that in the future it would be better.
Sorry if I was unclear, but my references to the JVM in particular being able to optimize things in future versions were meant to be generic, not specifically for this. I have no idea where optimizations could be made for boxing (altering the compiler, the byte-codes, the JIT, using some metadata, altering the JLS, etc)
Hmm, I always assumed that auto-boxing was a compile time thing, and so generated the same byte code as without. But to do some sort of internal JVM trick to optimise this then the VM would need to explicitly know about it, and so the boxing must be actually in the byte code and reworked at runtime.
Which one is it? I had assumed the first, but its taken so long to appear that that points to an actual addition to the bytecode language.
Generally, what you want is a efficient int HashMap, or some other collection. As others have pointed out, auto-boxing won’t mean primatives suddenly can be used efficiently with general purpose collections.
What you want is the generics stuff, and a JRE/VM that will optimise for this. Ie you use a HashMap that is compile time constrained to only use int’s for the keys and/or values. At runtime the JRE or VM can then substitute in an int optimised HashMap instead of the default one.
However, I don’t think primatives can be specified for generics - ie would have to be Integer object not int. Since an Integer object isn’t modifiable, could argue that functionally, storing them as ints wouldn’t make any difference. I think that’d be the case unless you were doing something that specifically depended on object references…