CPU Cache Confusion

fletchergames · October 11, 2009, 8:57pm

I’m a little bit confused about the purpose of the L2 cache on modern processors, given the existence of the L1 and L3 caches. I understand the general idea behind caching, but I don’t understand why the caches are set up the way they are.

I began thinking about this because of http://www.tomshardware.com/reviews/athlon-l3-cache,2416-2.html. When you get to the bottom of the review, you will notice that most of the newest CPUs have an L1 and L2 cache for each core and then an L3 cache for all the cores on the processor.

It makes perfect sense to me to have 1 cache for each core and then another cache for all the cores. I assume that the L3 cache has much higher latency.

What I don’t understand is why there’s 2 caches for each core. Is the L2 cache really that much slower than the L1 cache? It was my understanding that the CPU does some kind of sequential search of the cache (or possibly just PARTS of the cache; I’m not sure exactly) to see whether the data is there. How is it faster to search a small L1 cache and then a larger L2 cache than to just combine them into 1 cache? It seems like the latency of 1 cache would be less than the latency of going through 2 caches whose combined size is equal.

I assume that the L1 cache must be much faster than the L2 cache. Otherwise, Intel and AMD wouldn’t set it up this way. The only thing about the L1 cache that seems like it would be faster is that it’s smaller. It just seems like the extra speed from the relatively rare hits in the L1 cache would be balanced out by the double caching when there’s an L1 miss, whether there’s an L2 hit or not.

This doesn’t affect any programming I’m doing. I just wish I could understand.

Riven · October 11, 2009, 9:16pm

Most of your thoughts are correct. The smaller the cache, the faster the access. It is however not correct that there are a lot of cache misses in L1.

The L1 cache will be segmented by pages (4K), just like every other cache level.

Imagine an int[], then there will be 1024 elements in 1 page. So if you’re looping your int array, only 1 access in every 1024 will be a cache miss. As the L1 cache is faster than the L2 cache, this will have a significant impact on performance. Obviously the same rules apply when going from L2 to L3, and from L3 to system RAM (and from System RAM to the swapfile).