The PCI core therefore expects to have the data ready when the ENA signal which it produces is high, to sample it on the next rising edge of clock? This seems highly unusual for a synchronous design.
If it is absolutely necessary, you could either try inverting the port A clock, or use a 180deg phase shifted clock (respective to CLKA from your example),CLKA180, to clock the port A.
In that case, the ENA will be sampled in the BRAM on the rising edge of CLKA180, which corresponds to the falling edge of CLKA, and data will be present on the DOA output for the rising edge of CLKA.