Fantastic Map specializations and GC reduction

Hello,

I’ve just come across the package:

http://gongolo.usr.dsi.unimi.it/~vigna/fastUtil/

It provides specializations for Maps. F.E. instead of using a HashMap
with key=Integer, value=Integer you would use an Int2IntHashMap where key=int, value=int.

In a microbenchmark I have found that the Int2IntHashMap can be up
to 30 times faster than java.util.HashMap.
fwiw, 30 = 7272727/236966 where:

  1. 7272727 = 2nd iteration of Int2IntHashMap!
  2. 236966 = 15th iteration of HashMap (Hot spot had 12 more iterations to optimize)

On top of this, (and the reason for the number 30 above) is that using HashMaps can cause a lot of GC work.

The code and the results follow:

//
// $Id: TestFastUtil.java,v 1.4 2002/11/18 01:09:19 mswanson Exp $
//

package com.wss.utils.test;

import java.util.;
import it.unimi.dsi.fastUtil.
;

/**
*
*/
public class TestFastUtil {

public static void main(String[] args) throws Exception {
    int count = 400000;
    int timerCount = 20;
    long start, end;
    Integer tmp;

    // Normal HashMap
    HashMap hashMap = new HashMap(count);
    for (int timer=0; timer < timerCount; ++timer) {
        start = System.currentTimeMillis();
        for (int i=0; i < count; ++i) {
            hashMap.put(new Integer(i), new Integer(i));
        }
        end = System.currentTimeMillis();
        System.out.println("HashMap put(Integer, Integer) count:" + count +
            ", put/s:" + (count / ((float)(end-start) / (float)1000)));

        start = System.currentTimeMillis();
        for (int i=0; i < count; ++i) {
            Integer in = new Integer(i);
            tmp = (Integer)hashMap.get(in);
            if (!tmp.equals(in))
                throw new Exception("failed equals()");
        }
        end = System.currentTimeMillis();
        System.out.println("HashMap get(Integer) count:" + count +
            ", get/s:" + (count / ((float)(end-start) / (float)1000)));
    }

    // FastUtil HashMap Int2Int
    timerCount = 100;
    Int2IntHashMap int2IntHM = new Int2IntHashMap(count);
    int j;
    for (int timer=0; timer < timerCount; ++timer) {
        start = System.currentTimeMillis();
        for (int i=0; i < count; ++i) {
            int2IntHM.put(i, i);
        }
        end = System.currentTimeMillis();
        System.out.println("Int2Int put(Integer, Integer) count:" + count +
            ", put/s:" + (count / ((float)(end-start) / (float)1000)));

        start = System.currentTimeMillis();
        for (int i=0; i < count; ++i) {
            j = int2IntHM.get(i);
            if (i != j)
                throw new Exception("Int2Int failed equals()");
        }
        end = System.currentTimeMillis();
        System.out.println("Int2Int get(Integer) count:" + count +
            ", get/s:" + (count / ((float)(end-start) / (float)1000)));
    }

}

}

HashMap put(Integer, Integer) count:400000, put/s:137174.22
HashMap get(Integer) count:400000, get/s:603318.25
HashMap put(Integer, Integer) count:400000, put/s:508905.84
HashMap get(Integer) count:400000, get/s:966183.56
HashMap put(Integer, Integer) count:400000, put/s:203252.03
HashMap get(Integer) count:400000, get/s:1052631.6
HashMap put(Integer, Integer) count:400000, put/s:586510.25
HashMap get(Integer) count:400000, get/s:997506.25
HashMap put(Integer, Integer) count:400000, put/s:571428.56
HashMap get(Integer) count:400000, get/s:928074.25
HashMap put(Integer, Integer) count:400000, put/s:243013.36
HashMap get(Integer) count:400000, get/s:952381.0
HashMap put(Integer, Integer) count:400000, put/s:635930.06
HashMap get(Integer) count:400000, get/s:816326.5
HashMap put(Integer, Integer) count:400000, put/s:707964.6
HashMap get(Integer) count:400000, get/s:757575.75
HashMap put(Integer, Integer) count:400000, put/s:234879.62
HashMap get(Integer) count:400000, get/s:1111111.1
HashMap put(Integer, Integer) count:400000, put/s:492610.84
HashMap get(Integer) count:400000, get/s:975609.75
HashMap put(Integer, Integer) count:400000, put/s:583941.6
HashMap get(Integer) count:400000, get/s:915331.8
HashMap put(Integer, Integer) count:400000, put/s:626959.25
HashMap get(Integer) count:400000, get/s:275482.1
HashMap put(Integer, Integer) count:400000, put/s:686106.3
HashMap get(Integer) count:400000, get/s:781249.94
HashMap put(Integer, Integer) count:400000, put/s:571428.56
HashMap get(Integer) count:400000, get/s:1049868.8
HashMap put(Integer, Integer) count:400000, put/s:236966.83
HashMap get(Integer) count:400000, get/s:1044386.44
HashMap put(Integer, Integer) count:400000, put/s:589970.5
HashMap get(Integer) count:400000, get/s:943396.25
HashMap put(Integer, Integer) count:400000, put/s:597907.3
HashMap get(Integer) count:400000, get/s:843881.9
HashMap put(Integer, Integer) count:400000, put/s:689655.2
HashMap get(Integer) count:400000, get/s:273597.8
HashMap put(Integer, Integer) count:400000, put/s:698080.25
HashMap get(Integer) count:400000, get/s:767754.25
HashMap put(Integer, Integer) count:400000, put/s:583941.6
HashMap get(Integer) count:400000, get/s:1017811.7
Int2Int put(Integer, Integer) count:400000, put/s:3448276.0
Int2Int get(Integer) count:400000, get/s:3305785.2
Int2Int put(Integer, Integer) count:400000, put/s:7272727.5
Int2Int get(Integer) count:400000, get/s:7547170.0
Int2Int put(Integer, Integer) count:400000, put/s:7142857.0
Int2Int get(Integer) count:400000, get/s:7547170.0
Int2Int put(Integer, Integer) count:400000, put/s:7272727.5
Int2Int get(Integer) count:400000, get/s:7547170.0
Int2Int put(Integer, Integer) count:400000, put/s:7272727.5
Int2Int get(Integer) count:400000, get/s:7142857.0
Int2Int put(Integer, Integer) count:400000, put/s:7272727.5
Int2Int get(Integer) count:400000, get/s:7547170.0
Int2Int put(Integer, Integer) count:400000, put/s:7272727.5

Yep, no surprise here, seems a universal law of computer science is that the more generic something is (such as HashMaps ability to store any type) the slower it will perform. However, I think this problem will be addressed with the introduction of Generics such that you can indicate to the compiler the specific (aka. non-generic) type of Map you want, and should remove the class-casting and other overhead involved in the standard collection classes.

-Chris

wrt Generics I think a good comparison would be valuable. F.E.

  1. With Generics, is it possible to not use objects like fastUtil?
  2. what are the ways in which Generics are not as efficient as fastUtil?
  3. benchmarks would be nice but I do not believe there is an implementation of Generics available yet.

It would be most beneficial if fastUtil could find its way in the hands of all Generics developers. Just knowing something can work this fast could affect the Generics release/design.

I’m sure everyone would be interested in some comments by someone who knows exactly how Generics works.

Cheers.

Well, I don’t think the fastutil folks are doing anything special, except making collections tailored to the type that it’s holding.

As far as your questions:

  1. From what I understand, No. If I declare an ArrayList for collecting Integers, I can say new ArrayList but not new ArrayList.

  2. Initially, the Generics implementation will not be as fast as fastutil because the inital implementation of Generics will merely ‘hide’ the casting under the covers because of JVM compatabilities, but when they go with a native implementation, it should be as fast as the fastutil lib.

  3. There is an implementation of Generics that you can check out at http://www.javaworld.com/javaworld/jw-06-2001/j1-01-sintes3_p.html. I could be wrong, but I thought they had an implementation in the JDK 1.4 that you could turn on.

-Chris

Depends on your definition of special. 30x speed improvement is
pretty special to me.

If Generics are using ArrayList and not ArrayList then one can say:

  1. Generics will always impose a GC overhead - potentially much greater depending on use.
  2. Generics will always use more memory. Even with the latest HotSpot an Object is what, 8 or 12 bytes? Plus you have the 4 bytes for the int. This will matter in certain applications.
  3. It takes time to create an Object so Generics will always be slower.

Given these 3 conditions, bolstered by my cyberspace anonymity , I will boldly state that:

  1. Generics will always be at least 3x slower than fastUtil.
  2. Generics will always impose a GC overhead and all that entails
  3. Generics are not really what developers want or need - they want and need fastUtils.
:-)

[quote]wrt Generics I think a good comparison would be valuable. F.E.

  1. With Generics, is it possible to not use objects like fastUtil?
    [/quote]
    Not sure, but AFAIK upcoming generics will not support primitives. So they are no replacement for fastUtil.

[quote]Yep, no surprise here, seems a universal law of computer science is that the more generic something is (such as HashMaps ability to store any type) the slower it will perform. However, I think this problem will be addressed with the introduction of Generics such that you can indicate to the compiler the specific (aka. non-generic) type of Map you want, and should remove the class-casting and other overhead involved in the standard collection classes.

-Chris
[/quote]
A couple of comments.

  1. Good genericity doesn’t trade off speed. Take a look at how C++ supports extremely well performing generics with template specializations and policies. Modern C++ Design, www.moderncppdesign.com, is an excellent book written by Andrei Alexandrescu who is an expert on generics and performance.

  2. Type erasure is a simple change to the compiler that achieves nothing more than creating hidden type-casts. That class casting that you thought was gone is still there. In comparison, with C++ templates you have the choice of implementing the templates in terms of a base type and casting or simply using the template parameter types. If you need the speed, you can get it.

Basically, it boils down to this: Java generics are a weak auto-casting mechanism that can actually hurt performance as they elide the amount of casting going on. C++ templates are a powerful Turing complete language that give you the flexibility to implement very fast algorithms. If only Sun wouldn’t bend bass-ackwards in their efforts to keep the VM spec from changing, we might have something more like C++ templates.

God bless,
-Toby Reyelts

Wow - he really KNOWS!

Grom has got most of that correct. In and of itself, Java’s type-erasure form of Generics will never have any positive performance impact upon programs.

Different solutions include adding support to the virtual machine for generics, enhancing generics to work with primitives (without auto-boxing), smart object->primitive replacement, and Smalltalk style primitives. Who knows what Sun will see fit to do.

God bless,
-Toby Reyelts

Well, obviously your experience is differnt than mine, but the fact that fastutils is so much faster (although only supports specific types of collections) supports my claims.

I think that what generics gives you that fastutils doesn’t is that if i need a new type of collection, I’d have to wait for fastutil developers to give it to me (Such as a CustomerHashMap or something) or I need to roll my own. Generics allows me to define HashMap or HashMap or HashMap without having to re-write a container over and over.

C++ templates is nothing more than a code generator: It actually produces new source code based on the type you specify as the parameter. This is a sort of ‘copy-and-paste’ code reuse that I hardly find attractive, but it does have a benefit to the code bloat: it’s very fast.

I think that high performance code (that games require) will not find much use for ‘generic’ classes, the developers will need to take existing code and shape it to do a particular task extremely quickly. This makes it non-generic.

Anyways, there’s nothing that says that a VM can ‘collapse’ generic code into specific code so that it runs faster, but this would be at a cost of memory (each collapsed version of a super-class-subclass would have it’s own distinct instance). Most likely, this is something that HotSpot does.

-Chris

Well, obviously your experience is differnt than mine, but the fact that fastutils is so much faster (although only supports specific types of collections) supports my claims.

Huh? Exactly what claims of yours does it support?

I think that what generics gives you that fastutils doesn’t

Obviously, fastutils is a response to the crappy performance of Java containers and primitives, generic and otherwise.

C++ templates is nothing more than a code generator:

Lol. Try telling that to the C++ compiler implementers. You can bet they only wish it were that simple.

This is a sort of ‘copy-and-paste’ code reuse that I hardly find attractive, but it does have a benefit to the code bloat: it’s very fast.

I challenge you to define a negative attribute of “copy and paste” that is shared with C++ templates. Any “code bloat” associated with C++ templates is purely the fault of a naive programmer using them. Statements like yours belie any experience you might claim to have with them.

I think that high performance code (that games require) will not find much use for ‘generic’ classes

To put it about as politely as I can: Only because Java generics suck. If you think people want to have to create an entire new container for each possible combination of types under the sun, you’ve not been programming for very long.

Anyways, there’s nothing that says that a VM can ‘collapse’ generic code into specific code so that it runs faster, but this would be at a cost of memory (each collapsed version of a super-class-subclass would have it’s own distinct instance). Most likely, this is something that HotSpot does.

Can I have some of what you’ve been smoking? HotSpot doesn’t know anything about generics. There is no java bytecode anywhere that could possibly indicate any information about generics to Hotspot - All because Sun doesn’t want to introduce changes to the VM spec.

I don’t want this to turn into any sort of flamefest, I’m just critical - and Java generics push my buttons. I’d been waiting for something good for so long and then they released that crap. I’m not saying I want Java generics to be C++ templates, but for the love of coffee, couldn’t we have at least done something 1/10th as powerful?

God bless,
-Toby Reyelts

Another interesting thing about fastUtil:

It uses a shell script and a C PreProcessor to generate the Java sources. (fyi: the author has mentioned that he should be done a rewrite of the tree maps soon). Off the top of my head I find the following possibilities interesting:

After using C++ templates and the STL heavily for many years, I have noticed that I personally used a core set of features about 80% of the time. A good number of these features are already implemented in fastUtils. fastUtils is “fast” becoming that 1/10th as powerful solution because:

  1. fastUtils supports Int2ObjectHashMap <int, Object> and other combinations of native/Object data types that I believe would be optimal enough.

  2. fastUtils comes with a large number of specializations pre-built. No longer do I have to suffer long compile times waiting for templates and their partial specializations to compile. Bonus. And, when are enough specializations enough? If you look through the API and see what is covered perhaps you will think all the specializations are covered and all that needs maturing is the selection of algorithms.

  3. (Well done, kudos) the fastUtils developer had the forsight to structure the package in such a way where code generation is automated to support more than one specialization. It looks easy enough. If it proves not to be easy, at least it is possible.

To me, fastUtils is potentially the perfect equivalent to C++ templates and the STL, and what Generics should have been. The reason I say this is because fastUtils provides (or will soon provide) me with enough capability to satisfy most if not all of my needs. I can easily fill in the missing pieces or adjust my coding strategies to compensate.

ymmv, of course.

rreyelts,
Try as you might, you are defintely tossing a lot of flaim-bait out there, but I’ll try to politely address your points:

  1. The claim was that “good” genericity does not trade off speed. I said that in my experience, the more generalized soemthing becomes, the less performant it becomes (2 examples: Java Collections API, and the JVM. In the former case, you need to resort to casts and object overhead in the standard collections, and in the latter case, the JVM is designed to run on multiple platforms and there fore isn’t designed to be optimized to one particular platform). I hope this clears things up.

  2. Obviously… I don’t know what your statement here has to do with my saying one benefit of generics.

  3. What: a “preprocessor that creates source” doesn’t sound like a code generator?

  4. Negative impact of copy-paste: If you find a problem with your ‘boilerplate’ template, you need to go back and recompile (or re-generate) all the code that was built off that template. With generics it will work almost like inheritance in the sense that you only need to recompile your generic class for all instances of your generics to get updated.

  5. I was only making another point about the tradeoffs to get performance: Hotspot needs to keep a lot more around in memory to make it’s optimizations. This is overhead. C++ templates generate lots of source code that will need to be kept track of if the base tempate changes. This is overhead. I think most of my points in this thread have gone over-your-head. (sorry, that was rude of me)

-Chris

1) The claim was that “good” genericity does not trade off speed… I hope this clears things up.
Nope, you’re pretty much making no sense whatsoever. Java’s generic collections are poorly designed and thus suffer a performance penalty, and Java doesn’t give up platform optimizations - they’re just deferred to the JVM until runtime.

2) Obviously… I don’t know what your statement here has to do with my saying one benefit of generics.
My point was that fastutil isn’t really a response to generics and so you’re comparing apples and oranges. Fastutil is a response to crappy container performance and crappy Java generics won’t change that.

3) What: a “preprocessor that creates source” doesn’t sound like a code generator?

Lol. There isn’t a single C++ compiler in existance that implements templates using a preprocessor. I’m beginning to doubt that you’ve ever worked with C++.

4) Negative impact of copy-paste: If you find a problem with your ‘boilerplate’ template, you need to go back and recompile (or re-generate) all the code that was built off that template. With generics it will work almost like inheritance in the sense that you only need to recompile your generic class for all instances of your generics to get updated.

Lol. That’s the best you can come up with? That same “problem” is what allows you to perform optimizations like compile-time loop unrolling.

5) I was only making another point about the tradeoffs to get performance: Hotspot needs to keep a lot more around in memory to make it’s optimizations. This is overhead. C++ templates generate lots of source code that will need to be kept track of if the base tempate changes. This is overhead.

No Chris, you’re just talking nonsense. Let me repeat some of it: Anyways, there’s nothing that says that a VM can ‘collapse’ generic code into specific code so that it runs faster, but this would be at a cost of memory (each collapsed version of a super-class-subclass would have it’s own distinct instance). Most likely, this is something that HotSpot does.

I think most of my points in this thread have gone over-your-head. (sorry, that was rude of me)

Oh I see, the logic hurts, so you resort to fallacious ad hominem attacks. Fine, if you think you’re talking over my head, then why don’t you Google around some and take a look at the work I’ve done? I eat java bytecode for breakfast. You on the other hand, chris.knoll@immedient.com, seem to spend all of your time chewing on .NET, http://www.immedient.com/aboutus.asp?Nav=2&Tab=2

I’m not surprised it’s warped your brain. :wink:

God bless,
-Toby Reyelts

Hey guys… people are just trying to understand stuff here… If someone doesn’t know how things work but they think they do, it can be a simple honest misunderstanding. It isn’t because they are trying to be a smart*ss or something. Let’s try to correct these misunderstandings in a friendlier manner.

Just how are C++ templates realized then? I had always thought of the trivial implementation that amounts to automatic copy-paste as well… (it seems like the easiest thing to do… so I would guess that at least early implementations worked that way.)

What am I missing?

Hi guys, I’m the author of fastUtil.

There are a lot of issues in this thread, so I’ll just try to give my perspective.

  1. I wrote this stuff because I needed hash-based sets and maps with high performance and small memory footprint for a web crawler I’m developing with other people. We already use several type-specific classes in the COLT distribution, but it did not contain everything we needed.

  2. Genericity in Java as currently specified is useful syntactic sugar, but no more; certainly it does not apply to primitive types, certainly does not solve the wrap/unwrap problem, and possibly (but I wouldn’t bet on it) could even reduce performance.

  3. fastUtil is clearly “inspired” by C++ templates. There is however a major difference: because of the way Java handles classes, you can precompile everything (you will end up loading only the class you use anyway). This has a number of advantages.

  4. The speed gain with fastUtil is certainly also due to less garbage collection but for the most to better algorithms. Sincerely, I haven’t seen in the last years anyone implementing closed addressing (as Sun did). Its only advantage is that it handles slightly better deletions. Probably they also thought that since they had to wrap primitive types, there would have been no real disadvantage in wrapping again the wrapper in an Entry object. I don’t know–I just know that those classes are completely useless for my needs. The algorithms used in fastUtil are simple but effective. They use all the known theory about hash tables to squeeze every bit of performance (e.g., using twin primes). Moreover, the important functions are willingly coded as tight loops contained in a single method, so that they end up being optimised very soon.

  5. fastUtil uses theory, but tries to be pragmatic. Yes, you can use an additional pointer per entry to keep a list of entries, so that iteration is linear in the number of entries. However, in the real world a 75% filled table scanned with a tight loop will always give faster iteration, even if you have a slight overhead due to empty/deleted buckets. This can hinder performance if the table contains mostly deleted entries, but again, fastUtil is optimised for the general case.

I have almost finished developing a replacement for sorted sets and maps. It consumes only one integer and two references per stored element (against three references and one boolean for java.util). Of course, it doesn’t need wrappers.

If anyone is interested in beta testing, please let me know.

Ciao,

                                                              seba

Thanks for posting Vigna!
Your fastUtil package is having a profound effect on a particular module I’m using. fastUtil will probably wind up in most projects I work on from now on.

Cheers.

Let’s try to correct these misunderstandings in a friendlier manner.

Well, I tried to warn people that Java generics push my buttons. If you try to defend them to me, you’re liable to get mown down. :stuck_out_tongue:

Just how are C++ templates realized then? I had always thought of the trivial implementation that amounts to automatic copy-paste as well

First, let’s get this notion of using a preprocessor to implement templates out of the way. A preprocessor implements a language of its own. For example, both cpp and m4 implement their own languages. They operate outside of the bounds of the meaning of the text they transform. Therefore, preprocessors have no respect for your programming language’s features - such as type, scope, namespaces, overloading, etc… (anything which has semantic meaning above grammar / syntax). Obviously C++ templates can’t be implemented by a preprocessor because they respect and interact with all of these language features (sometimes in quite complicated ways) - whereas, macros, for example, don’t respect a single one.

Now, to address “copy and paste”. C++ templates are generative (which is part of what makes them generic - read a good book on generic programming), but they aren’t copy and paste.

First, you aren’t manually generating any code which you have to maintain. For example, java.lang.reflect.Proxy is also generative - it generates new classes at runtime. However, we don’t think of Proxy as a copy and paste mechanism, because it doesn’t suffer any of the classic pitfalls of copy and paste. It’s simply a generative, generic class.

Second, templates follow special rules that have nothing to do with copy and paste, for example, template specialization, both partial and full. Another example of special template rules are those that govern the interaction of templates and the C++ type system. They don’t match a copy and paste mechanism.

Third, the C++ template mechanism implements a Turing complete language, with features like compile time control-flow and iteration / recursion. For example, it’s possible to write a function that computes the factorial of a parameter N at compile time. This use of templates is known as template meta-programming and is commonly used to perform optimizations like compile-time loop unrolling. Try doing that with Ctrl+C and Ctrl+V.

I can only guess that some programmers may be tricked into believing that templates are “copy and paste”, because they operate on structural conformance. (Another point of flexibility I wish Java generics had - or at least replaced with something better).

Again, to reiterate my point, Java generics don’t have to equal C++ templates, but they should have learned something from them. Instead, they’re a dumb auto-casting mechanism, with no power and no flexibility. They are not generics - They’re a hack to make Java containers slightly easier to work with. Don’t get me wrong, I love Java: the lack of broken type semantics, cross-platform, built-in multithreading, monitor info, stack trace info, security, reflection, dynamic class loading, easy to understand bytecode (and hence easy generative programming) - but Java generics just plain suck.

God bless,
-Toby Reyelts

2) Genericity in Java as currently specified is useful syntactic sugar, but no more; certainly it does not apply to primitive types, certainly does not solve the wrap/unwrap problem, and possibly (but I wouldn’t bet on it) could even reduce performance.

If you double-check you can see that my comment on reducing performance was in respect to the hiding of casts. Apparently some programmers believe that Java generics will be removing the casts. Given that, they might opt for using Object-based containers when they otherwise would have not.

3) fastUtil is clearly “inspired” by C++ templates. There is however a major difference: because of the way Java handles classes, you can precompile everything (you will end up loading only the class you use anyway). This has a number of advantages.

Can you explain what you mean by this? You can also “pre-compile” C++ templates. For example, I could explicitly instantiate (“pre-compile”) vector, vector, vector, etc… In the same respect, even the most naive C++ linkers would be smart enough to remove the dead code, if say, the vector instantiation went unused in the final program.

God bless,
-Toby Reyelts

Ah well, I can only respond to the same idiocy two times before I give up, so, I suggest if you really want to understand what I’m saying, re-read my posts again. You obviously can’t accept information from anyplace except your own misconceptions, and you are completely blinded by where you think the information is comming from (Yep, I work for a MS Solution provider, but I strictly do Java work, and I like to think of myself as someone keeping the balance of technologies in my corporation. You should not be to quick to judge.). I’m sure that you can go on and on reading one passage from me and understanding something completely different, but while you might find it fun playing the fool, I find it pretty boring to talk to one.

I’m getting flashbacks about talking to someone about the applet security model and webstart, and he was as thick as you seem to be. Enough is enough.

-Chris