Good questions, Loucy. (OK to call me "Jim".)
Actually, you can try it on SonnetLite. Please do so. The FFT takes some memory and SonnetLite is limited to 16 MB. So, with almost no circuit, I decreased the cell size by specifying 1000 cells per box side. (1500x1500 cells required > 16 MB.) You can do this too. Here are the SonnetLite timing statistics:
Frequency: 1 GHz
Circuit requires 5 subsections and 12 MB of memory.
Subsections by level and type:
Level 0:
Staircase: 5
Waveguide mode time: 0.156 seconds.
Fourier transform time: 1 second.
Coupling time: 0 seconds.
Loss time: 0 seconds.
Matrix fill time: 1 second.
Matrix solve time: 0 seconds.
De-embed left box wall:
First de-embedding standard, left box wall: 40 mils length, 18 subsections, about 8 MB.
Time: 0.704 seconds.
Second de-embedding standard, left box wall: 80 mils length, 35 subsections, about 12 MB.
Time: 1 second.
Total time per frequency: 3 seconds.
In otherwords, a 1000x1000 FFT reqires 1 second. (There were only X directed subsections. If there were both X and Y, there would be 3 FFTs, but no increase in memeory.) The calculation of waveguide modes took 0.156 Seconds. For a 1000x1000 FFT, Sonnet (by default) calucates 2000x2000 = 4000000 modes in that 0.156 Seconds. Pretty fast, huh?
If you increase the number of modes (for example, by specifying -c3), the only thing you increase is the 0.156 Seconds of waveguide mode time. Not a big deal. The FFT remains at 1000x1000 and FFT time remains unchanged. I get the impression from your comments that the FFT size for EM3DS increases in this case. If true, that is very inefficient!
What requires time and memory, is if you have lots of subsections. For SonnetLite, matrix solve time is also almost nothing, because of the memory limit. For full Sonnet, 1000 subsections means a 1000x1000 matrix (this is independent of the FFT size and time), and fill time and solve time combined are just a few seconds. When you get up to 20000 subsections, the fill+solve time can be 20 minutes or so (3 GHz PC). If you invoke conformal meshing (on which we have a patent), the subsection count goes down by a factor of 10 to 100, but the matrix fill time goes up a bit. When the subsection count goes down by a factor of 10, the matrix solve time goes down by 10**3 = 1000 times faster. This is pretty serious stuff, don't you agree?
Basically, Sonnet development has been focused exclusively on our planar shielded FFT tool for nearly 23 years. Newcomers simply are not going to catch up to our technology. If this "asymptotic estimator" figures out the tail of the Green's function infinitely faster than we do and they reduce the time for the rest of the Green's funciton to zero, they will gain 0.156 Seconds over Sonnet on a 1000x1000 cell substrate. That is useful only for marketing and only if users never find out how insignificant it really is.
Bottom line: If you have both tools, don't take anybody's word for anything (including me). Think a little bit about it and then run some benchmarks.
---This just in. I tried -c3 option. Now 3000x3000 = 9000000 modes are calcualted. Mode calculation time is now 0.235 Seconds, FFT time is 3 seconds. The FFT time includes inserting all 9000000 modes into the 1000x1000 FFT matrix.