step1: create a FIR filter in matlab/octave. you can use a windowing method or something like firpm/remez/firls/etc...
step2: model the filter with quantized coefficients. this will aid you in determining how many bits to use for the multipliers.
step3: use the sample rate, filter length, and the multiplier size to determine how many multipliers are needed in the FPGA. eg, if you have 1000 samples per second with a 100 tap filter with 16b coefficients (fitting in 1 18x18 multiplier) then you need to do 100*1000 multiplications per second. A single multiplier could then be used with a state machine and could run at 100kHz (very easy timing). A 19200 tap filter and 192000 sps system would need 3.86E9 multiplies per second. This would need around 16 multipliers running in parallel and running a little over 230MHz. Likewise, if 32b coefs are needed, this could mean even more multipliers. At that point, it is probably better to look into decimating the data as it is unlikely a heart rate has a 90kHz bandwidth requirement...
step 4: post more information or begin to write the code for the design.