Hotspot VM not deterministic under heavy load and swapping

Scenario:
I’m giving the Wikipedia compression contest a try, where you have to compress 100MB of text to the absolute minimum. Not so much to try to win, but see what problems there are in this field.

As you can imagine, the program builds a lot of data-structures, pumps data among them, and analyses patterns in the words. It uses a lot of RAM, mainly in byte[], int[] and objects (no Buffers).

I can run the application a few times, and it crashes in a few kinds of errors. This is worrying, as there is only 1 thread working on the data, so IF there should be errors, it should be the same every time the program is run. Furthermore, the errors don’t make any sense. Like copying from an int[] to an int[] with a for-loop, results in a ArrayStoreException, sometimes a NullPointerException, and a native crash (!) every once in a while, and 10% of the time the program runs just fine.

These errors start to occur when the program starts to swap to the harddisk / heap is completely flooded.

I tried both Java 5.0 and 6.0, same thing.

When dealing with a large amount of data you eventually have to start streaming using the Hard-Drive manually as VM (Virtual Memory) does not cut it. There may be existing libraries for this, but it’s just basic caching principles anyway so it’s not that hard to write.

AFAIK virtual memory is supposed to ehm… work??!

Even if it’s factor 100 slower than doing things yourself (with a clever algorithm), it shouldn’t corrupt anything, and certainly not make the JVM crash.

If memory allocation fails then programs are returned null pointers, so maybe Java is just built not to check this! Note that memory caching is not by any means difficult! You could code an error free implementation in under an hour, maybe even a couple of minutes!!!

What you should be getting is an OutOfMemoryException, so I’m not sure why you’re getting such unrelated errors. Could it even be possible that you have faulty memory, or a faulty harddrive?

There’s loads of virtual memory available. It’s just that the Java Heap is full (and Java doesn’t do any mallocs after the heap-size reaches its max).

It could be a faulty HDD, but hey, no other apps are crashing…

Well you are in a situation where you know your program will exceed the default heap size, so the only options are to set the heapsize to be larger, or to create a solution using HDD caching. Whatever the answer to you question is, it is very unlikely to assist you in creating a solution that does not employ one of the choices I gave (unless there is another option).

You should be getting an OutOfMemoryException, so submit this as a bug to Sun and see what they come up with. Create a test case doing the most basic operations to duplicate this error using minimal code too so that you can be sure it is not down to anything else.

Pff… I’m not exceeding the default heap-size (that’s impossible in Java). The heapsize is just to big to fit in physical RAM.

Further, I’m not out here for assistance, there is no answer to my question, because I didn’t ask any. I’m posting here to report this unwanted behaviour.

I’ll post a test-case later on, and no, it can’t be ‘down to anything else’. I’m using pure Java, so native crashes can’t be blamed on my code.

It’s realatively easy to code around it, but this shouldn’t be happening!

Oh right, well like I was saying you should report this to Sun. What I’ve read has said otherwise to the size of the Java heap (unless I’m reading something wrong).

http://kb.adobe.com/selfservice/viewContent.do?externalId=tn_17470&sliceId=2
http://developer.apple.com/documentation/Java/Reference/Java14VMOptions/VM_Options/chapter_2_section_4.html

The heap is the area in memory in which objects are created.

    // Get current size of heap in bytes
    long heapSize = Runtime.getRuntime().totalMemory();
    
    // Get maximum size of heap in bytes. The heap cannot grow beyond this size.
    // Any attempt will result in an OutOfMemoryException.
    long heapMaxSize = Runtime.getRuntime().maxMemory();
    
    // Get amount of free memory within the heap in bytes. This size will increase
    // after garbage collection and decrease as new objects are created.
    long heapFreeSize = Runtime.getRuntime().freeMemory();

EDIT: Google terms were “finding java heap size”

Riven - try running with -Xint so that compilation is turned off to see if it’s the result of a compiler bug.

Cas :slight_smile:

A little off topic, but how is your compressor attempt going?

Bewarned: it is a very addictive activity! I have been porgramming a wikipedia compressor off and on for months now. Every so often i find that i have an an idea which i must implement in the compressor ti find out whether it makes for better compression.

I really can’t call me attempt a compressor per se… it is more of a pre and post processor which aims to convert the input xml file into a more readily compressable form and to use an existing proven compression technique for the actual compression.

My compressor is not very fast as i have favoured to develop it in a much easier to understand manner. Perhaps if i manage to make the compressor compress close or better than the record then i will attempt to optimise it to be fast…

Ok! I think I might join that obsession :slight_smile:

Uh… Can I really modify a message that isn’t mine? (me == Riven)

EDIT: It turns out you can :wink: BY KELDON

-Xint crashes on me too, although it’s a lot more RAM friendly, so it only starts to natively crash after growing the data by 50%

For those interested:


unique words: 727.101 /  35.030.333 (2%)
sizeOfAllWords: 7.343.574 bytes
basicBitCompressionOfWords: 50.641.934 bits (6.330.241 bytes)
uniqueWordGroups: 1.337.760

So… these are the first humble steps. Even trying to get near 16MB is hard enough.

Good stuff! There are so many interesting things you can try with words that this one is pretty interesting ;D Although you cannot submit a lossy compression method, you can create a lossy compression method and store compressed correction data.

Once I was getting similar random errors when storing lots of stuff in an ArrayList or a HashMap (can’t remember), but some of the random errors were OutOfMemoryErrors, and I assumed that the underlying error behind the other random errors such as NullPointerExceptions and ArrayStoreExceptions were actually OutOfMemoryErrors that caused the random ones but it wasn’t printed (maybe because there wasn’t enough memory to print it or something).

tried something like this?

http://www.roseindia.net/javatutorials/OutOfMemoryError_Warning_System.shtml

Looks interesting… although i am not sure what to make of the results provided :slight_smile: I agree with you on trying to get near 16 MB is very hard! I would be happy to get close to 20MB :stuck_out_tongue: I am currently sitting at a compressed size of 25MB however this is using only rudimentary frequency statistical models. I am hoping with more complex models i am developing i can get down to 20MB. That is my goal at the moment… still it is better than simply using ZIP (34.7MB )

I would like to beat it and make enough money to feel financially satisfied :o I have some interesting ideas which I was planning to test on audio but does the same with text. Give it a few weeks and maybe I’ll have a little demo!

So Riven, you never answered my PM; what are the steps to the editing? I did it once (as you saw) but I did one last change that I cannot figure out that made it work. Maybe it is the sequence of the changes or something.