Well the key is the invisible walls. Essentially, everything you see you can’t actually interact with. The real physical walls are not drawn and are only used for moving logic. That way I can in fact make humungous gigantic pictures with only one single block that you can’t walk through. Imagine this:
Say we have the sprite for a big house. It might look like this, with ^'s being the roof (foreground), @@ being the middle (midground), and ++ being the part you can walk on top of “in front of” (background).
^^^^^
@@@@@
@@@@@
+++++
That actual sprite, although drawn with different characters, is two sprites placed in the top left corner, one in the foreground drawn above the player and one in the background, like this:
+0000
+0000
00000
00000
Where the + is the single block where the sprite is saved in the 3D array, but can be seen across the whole thing. The +0000 is just the roof saved in some layer higher than the player so it is drawn on top of him, then the rest is drawn in a layer under the player.
Now with just that, I would have a large image taking up several blocks you could just walk all over. What now becomes neccesary is to include spaces in the midground so, logically speaking, part of the house obstructs walking. an O is an empty space, and a + is where a wall is placed in the moddle layer. Note these walls are invisible.
OOOOO
+++++
+++++
OOOOO
The character can walk anywhere where there are O’s, so if you want on the top you would be unobstructed but a roof would still be drawn on top of you, while the bottom O’s can be walked upon even though you are walking on top of a sprite. Try walking through the entire house, however, and you will be stopped by invisible blocks.
In this way, although the collision is in fact entirely grid based like FF3, it appears as if you are interacting with a 3D house – a house which is one sprite (in fact I allow for the stretching/shrinking of any sprite to any size). Chrono Trigger probably did something on this level, so even though you may believe that FF3 and CT drawing methods are so different, they in fact probably use the same walking engine – CT just has a few layers slapped on to create the pretty (but fake) 3D effect.