Haven't worked with fft. Don't know where you might find program code.
In this application you want to ignore false triggers. But you also want to be sure you'll catch that one real scream. Not easy.
A scream can be characterized as a sustained pitch for longer than, say, a few tenths of a second.
But then your device needs to be able to filter out singing. Because that resembles a scream too.
Or someone screaming on tv.
Or children when they have those lovable screaming contests.
Or a woman squealing with delight when you give her a $1,000 necklace.
Which adds up to: Do you have a way to detect someone's mental state based on how they vocalize?
This is not meant to discourage you. It's a worthy project, even if only to demonstrate the concept.
As for FFT, it can contribute only certain information. It's just part of devising an algorithm that will distinguish a scream from other sustained waveforms.
It isn't hard to make your device ignore typical speech. Words are more like transient waveforms than sustained. A fast falling peak detector will filter out speech.
You want to check into all possible characteristics about the vocal inflections of a scream. Example, attack and decay envelope. Presence and intensity of certain overtones. Pitch change. Tremolo in amplitude (quavering). Etc.