I would go with your first answer too. In a current mirror, they have the same effective gate voltage, so (arguably) simpler to use gm = 2 I / Veff. This would be my argument:
A transistor has 4kT r gm of noise current PSD, means from the secondary side you have
Is^2(f) = 4kT r gms = 4kT r (2 * 5 * I / Veff)
The noise current on the input would be
Ip^2(f) = 4kT r gmp = 4 kT r (2 * I / Veff)
and on the gates, the noise voltage PSD would be
Vg^2(f) = 4kT r gmp / gmp^2 (assuming gds << gm) = 4kT r / gmp
Which is transferred to output as per
4kT r / gmp * gms^2 = 4kT r gms^2/gmp = 4kT r gms * (gms/gmp) = Is^2(f) * (2*5*I/Veff) / (2*I/Veff) = 5 Is^2(f)
However, in this hypothetical question the bandwidth limitation was not mentioned at all. The primary transistor, for example, would potentially see a larger capacitive load than the secondary, etc.