There is a very simple reason for that, changing L changes the device properties significantly while you can increase the W by just putting multiple of these devices together.
Also you would probably want to use a shorter channel usually because if you're going for a longer channel length why not just use an older, cheaper process. That is not to say you would never use longer channels, you would but you wouldn't rely on longer channels as a design principle in a process where you have access to shorter channel devices because there is a price difference and by using a more advanced node you paid that price.
So now there is a consideration that is strong-arming designer to use shorter channels, and it is known that even if your reference currents and everything are supplied by 2 * Lmin devices when you change the current density your device properties are going to change.
Considering all these, it's just really bad practice to scale things by scaling L. But again that is not to say it's never used, I actually did this in a current trimming circuit where I needed a very small current and I had already hit the minimum width that I could use. It's just not a very good way when there are alternatives.