Greedy Garbage Collector

moogie · August 19, 2010, 7:04am

I had a thought in the shower today (dont ask… i have my best ideas there )

In this day and age, where harddrive storage space is ridiculously cheap and most people have tonnes of free space, is there any realy need in most applications for any “Real” collection of garbage?

The benefits should be no real long pauses

A naive impelmentation might be similar to this:

for each “new” instance created allocate required memory. Add to top of Objects-in-RAM-Queue (ORQ).

On request for a particular instance’s memory address the following will happen:

if the instance is in the ORQ, return the memory address. Add to top of Objects-in-RAM-Queue (ORQ).
else if the instance in the Add to Disk Cache Map(DCM). Remove from DCM and add to top of Objects-in-RAM-Queue (ORQ).

a GC thread will periodically monitor the last time stamp (mabe cpu cycle count? ) of the last instance in the ORQ if the instance the time stamp indicates the instance is old then remove the instance from the ORQ and add to the DCM. Repeat this until GC thread time exceeded or last instance is too young.

where the ORQ utlises “normal” RAM and DCM utilises a harddrive or similar.

alternatively perhaps just let the OS deal with the memory requests and manage the paging to disc and so just always allocate memory, never destroy?

Riven · August 19, 2010, 7:54am

Remember that every relocation of an Object requires all Objects with fields referring to it need to be updated.

That means that if you move an Object, you’d have to iterate over the bazillion objects on disk. That is slow.

Roquen · August 19, 2010, 10:10am

Waiting on main (uncached) memory is slow…disc access is the waiting for start of the next ice age.

moogie · August 19, 2010, 10:16am

good call

interesting thought exercise though.

so perhaps it would be best to let the OS do the grunt work (which would be easier ) and just keep requesting memory and just not release until end of execution.

Riven · August 19, 2010, 10:39am

Sounds like an ‘infinite’ heap on a 64 bit system. The functionality you describe is called swapping

The next-big-thing (Garbage First garbage collector) will have ‘infinite’ 1MB heaps, will make this feasible.

i30817 · August 23, 2010, 8:42pm

I think that a gc helped by a language that has no pointer aliasing by default(ie: with a session types ownership protocol) + a good compiler, would be pretty great. It would be able to insert memory free on the correct place without programmer intervention (besides the aliasing type when needed) leading to reduced gc pressure. In fact some programs could require no gc at all (guess pure functional), and could be marked as such.

I think it could be awesome, at the cost of a little programmer convenience.

Riven · August 23, 2010, 10:57pm

There is one problem: your harddisk must be fast enough… It’s not uncommon to be generating a few hundred MB / sec, if not more. How will you cope with that, if you write everything to disk eventually?

It is a ‘good’ idea, in a very restricted environment. RealTimeJava has immortal heaps for that reason: no GC, but well, they are finite size and the VM keeps everything in RAM (unless the OS starts swapping).

JL235 · November 30, 2010, 3:45pm

My boss told me a few weeks ago that he learnt object-orientation working on a system kinda like the one described above. All objects were global, used a unique 128-bit reference and were automatically stored to disk (making them permanent). It was apparently ultra simple, very elegant and everything from the language down to the CPU was custom built as one giant OO machine.

Decades years ago it kinda made sense to design big systems from the ground up; but these days it doesn’t. There is also the obvious question of: what would you gain from this?

There are also high-end database systems that exist today that constantly write to their tables. Trading systems are a good example. They employ plenty of tricks and high-end hardware to avoid writing unless necessary. If they can work in real-time then there is no reason why this system can’t.

erjiang · December 2, 2010, 1:41am

If you use two pointers, relocating an object just changes one pointer.

Roquen · December 2, 2010, 4:40am

Object tables are slow and pollute the cache.