As described in the first post, it is not hamming weight, at least not directly. it would be hamming weight of -x ^ ~x. (eg: x= 10110000, ~x = 01001111, -x = 01010000, -x ^ ~x = 00001111)
I can think of three implementations. The logic description similar to the while loop or casex above. the while loop looks to match my suggestion, which is a little different than the original post.
The second would be the -x ^ ~x plus adder tree plus barrelshift.
The last I could think of would be -x & x plus a bit-reversal (free) plus a multiplication plus a bit-reversal. eg: x = 10110000, r = 00001101, m = -x & x = 00010000, r*m = 11010000, reversed = 00001011
The first will be mostly LUT based, which might not be that fast. The second is an adder+lut, adder-tree, and 32:1 mux. It will at least make use of the fast carry chain for some logic. The last is an adder+lut and a multiplier.
The latter might work out best if the FPGA has HW support for the addition and the multiplier of correct size. The operation is a fairly heavy operation in any case.