Usually there are no Swing components involved in games, except the basic Frame or Window that contains all. But in almost every case, there will be a painting algorithm that does all the painting. Except from that, your ideas are probably correct. Games use layered painting. From background to foreground, isometric games additionally from top to bottom so that things can overlap. 3D is something different, but works similar. Objects are drawn background to foreground and farest to nearest. Plus optimisation.
The background is probably a image or several images. There will be some kind of additional “obstacle” information that knows where you can walk and where not.
A typical isometric rendering would look like that:
from left to right
from top to bottom
render current texture on the ground
render any objects on the ground
render any entities on the field
render anything in the foreground
This way an object lower on the screen can partially or fully overdraw some object higher on the screen. This is the way of one thing being behind the other.
-JAW