Spiral-searches or diamond searches are close to what you want? What are these algorithms you refer to, b-matching or adaptive block area algos?
Look here pal, if you are not discussing the profile of your applications, and especially the type of input streams (fast-changing or not, high-contrast or not) you won't get much help.
BTW I have designed 2 FSMD architectures for motion estimation, in VHDL, the one of those featuring complex non-cached memory architecture. So I am qualified to discuss this stuff. Also, TSS is not that bad for low motion.
the_penetrator©