My general experience is that you seldom gain much from optimising such things, first hit the algorithm this is almost always where the big wins are, then only if the profiler indicates that such code is a major hotspot should you eveb think about optimising it.
Time to hit the processor manuals to see what pipeline and parallel execution tricks you can play, iff after thinking about the algorithm carefully you still need to tune this tiny snippet (The surrounding code is likely to still be in the pipelne so you will need to consider at least a few instructions either side (Probably the loop preamble and postamble, this is in a tighr loop, correct?).