You aren't using any kind of encoding or are you doing the encoding outside the transceiver?
The way you have the transceiver setup looks like you're trying to use it like you would a pair of LVDS. I noticed it's set to use local reference clock so you absolutely have to oversample the input. If I recall correctly that mode is exclusively for running the symbol rate below the ~97.7MB raw symbol rate (no encoding, 32-bit at 3.125 Gbps).
As configured you have to perform some sort of external alignment as you aren't doing that using any kind of encoding as I can see. Also if your reference clocks aren't from an identical source then you will have issues with drift between your transmitter and receiver as you have the transceiver setup with a direct connection instead of using the internal FIFOs.
Have you ever worked with multi-gigabit transceivers before? What you are trying to implement is probably non-optimal and is probably more suitable for non-GT pins, if you don't plan on using any kind of symbol based transmission then you should probably use LVDS pairs and send your data and a clock over two pairs of pins. The whole point of using the transceivers is to send the data and transmission clock over the same pair of lines. You do that by embedding clock information into the data (8b10b encoding or 64b66b encoding). The encoding is to allow for enough transitions in the bits to perform clock recovery at the receiver. As there are more symbols allowed than possible data, special symbols called K-codes are defined which allow for synchronization and transmission of control information.
I'm not really sure why you went this route. You seem to be using the transceiver in a way that guarantees you will have problems with data synchronization, reference clock frequency drift. If this is for a product then you should stick with using an actual standard protocol of some sort that uses the transceivers built in resources that provide for synchronization and clock recovery instead of inventing your own method.
Regards