Good!
I think you were 'over' inventing the wheel. To produce a sine through a DAC requires that you load each point of the waveform (256 for 8-bit resolution) at a 4KHz rate so the DAC needs loading with a new number 256 * 4,000 times per second, that ties up a lot of processor time at 1,024,000 writes/second. Presumably the MCU is doing other things as well so it would eat up a lot of processing time. If you produce the data by programming a timer to produce a square wave at an output pin you only have to do it once per bit and in some MCUs, even small PICs, you can use the signal modulator to switch frequency from the UART output so FSK comes directly out of the pin with no software used at all.
At the receiving end, it doesn't care what the waveform is, only the frequency matters so clean up the signal if necessary with a schmitt input (some MCUs have that facility on the input pin already), then use a timer to measure the frequency. With interrupts it only takes a few lines of code and almost no processing overhead.
Brian.