this would be a problem of "deconvolution". This is not always well posed. For examples, if x1[n] = 0 or more generally if any bin of fft(x1) = 0 then it becomes much more difficult.
The process in other cases would be to determine an x3[n] with the property that x3[n] * x2[n] = 1 when n = 0, and 0 otherwise. (where * is circular convolution for this problem).
This is a bit easier in the FFT domain -- the fft of the impulse function is simply a constant real value at all bins. you can easily compute X3_k = q / X2_k. (q is a constant, X2_k is the k'th bin of the fft of x2, X3_k is the k'th bin of the fft of x3). You then take the inverse FFT of X3 to get x3[n] (though you might want to wait).
Now you can take y[n] and do a circular convolution with x3. this would give x1 * x2 * x3 = x1 * (x2 *x3) = x1 * (impulse) = x1.
There are clearly some numerical issues if the fft of x2 has a large dynamic range, or has 0's in any bin. The general topic of "deconvolution" has approximation and numerical methods that improve accuracy for these cases. (eg, an approximation would simply set 0 values to some low value to at least allow computation to be done)