A couple of quick comments on what I've read here ...
First, I agree you should try to use COREGEN.
I'm personally not familiar with instantiation using generics for synthesis (though I think this does occur inside the instantiation "wrapper" out of COREGEN, so that simulation uses the common FIFO behavioral model instead of the primitives netlist from COREGEN).
Note that you can batch COREGEN to modify FIFO params and regen, if this helps with your original concern with COREGEN.
Go to Xilinx support and search for "coregen command" and you'll find some leads on how to do this.
Note that the response to the one titled "Coregen, command line usage" lists some good caveats to be aware of.
BTW, those ISE Language Templates are really just helpful code-snippets (and not complete IP) for common types of RTL operations/functions.
Instead, a FIFO is a higher-level system of multiple functions (e.g. a 2-port RAM *plus* a controller).
Finally, FWIW, regarding why you were running into erroneous data with your original RTL implementation ...
You appear to have used the NON-grey-coded version of the OpenCores FIFO design (i.e. the pointers are incremented linearly/binarily and not as a Grey code).
Because of this, you are likely running into fundamental CDC issues when re-sampling the write/read pointers across to the other clock.
If you must pursue your own RTL approach, I'd also suggest scanning through the following papers ...
->
http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO1.pdf
->
http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO2.pdf