You will have to look at the load that the buffer is driving, meaning the net and the input pins of connected instances. Each buffer should have a specified maximum fanout and maximum capacitance that it can drive. Consult your technology library for these values. You then have to find the number of loads, and calculate the capacitance of the affected net.
First, you should eliminate any buffers that do not satisfy the number of loads or capacitance on the particular net.
Next, select a buffer (or buffers) based on your design goals. For instance, if your goal is low-power, you want to avoid the big drivers unless absolutely necessary. Finally, make sure the buffer (or combination of buffers) total delay is between 1ns and 3ns, otherwise you will still have timing violations.
If you have multiple sizes of the instance driving the net with the timing violation, you may not need to use a buffer at all. Look at the delays, max fanouts, max capacitances of these instances, and eliminate those that do not satisfy the requirement. But how do you choose whether to use a buffer or just resize the instance? Again, it goes back to your design goals. Do you want low gate-count? Low power? Small area? You should choose the implementation based on what you are optimizing for.