I suggest performance by balancing:
two things you have to weight up against each other:
-time to convert data
-time to copy data
when you draw more surface than actually seen (and you do,  as rectified tiles overlap here) maybe a backbuffer-and-copy strategy would be better. i experienced that in my game. the reason is that your images (paletted gifs for example) need to be converted to screen mode which is 24/32bpp mostly and can have several different alignments of data (bitmap type; eg INT_RGB, BYTE etc…)
so there are three different ways to use:
-draw directly to a flip-buffer (as you do, if it is available)
pro: less to draw
con: every tile, and so more then actually visible surface is converted every time drawn
-draw to a backbuffer that exactly matches image-bitmap-mode (eg. paletted mode with global palette for gifs)
pro: conversion is only done once for exact the whole visible surface
con: whole surface will be converted even if less is drawn (eg. no scrolling but moving sprites), for gif’s a global palette is needed
-convert all images first
pro: no conversion at runtime at all
con: heavy memory footprint
for me the last worked best
in fact i copy every loaded image to a same-sized bitmap with the exact properties of the output-screen, and use this new duplicates for drawing. if all works, no conversen is done then.
this even accelarates alpha-channel blits i used a lot, but these are allways many times slower than direct blit.
hopes that helps
Paul