The legality of encrypted communication over a public channel will vary from one country to another, check with your national licensing authorities.
Because the range of transmittable audio frequencies is limited, you have to use a method of encoding that keeps the result within the audio pitch range. Some systems use 'chop and resequence' algorithms to rearrange the sound samples to make them unintelligible, you have to store them and put them back in the correct order to make it understandable again. Another popular system is spectral inversion, you swap high and low frequencies (a.k.a. Donald Duck mode) which you can re-reverse at the receiver to make it normal again. None of these methods is trivial, you need fairly advanced electronics to make them work.
Brian.