Introduction
Scaling Laws for Neural Language Models, 2020, Kaplan et al, popularized the idea of "loss scales as a power-law with model size, dataset size, and the amount of compute" , in other words, models get better with model size, more (quality) data, and compute. Belief in this view / actual observation in practice is one of the clear drivers behind massive infrastructure investments and the anticipation of continued improvement in models, and perhaps even one day, human-like intelligence.
While I and others have easy access to such papers, and can read them, sometimes it is helpful to express formal notation, experimental data, and a long paper into something a little simpler to carry around every day. For me, one of the simple ideas I carry around is that complex information spaces, with a large number of combinations, can be error prone. If there are more combinations than ways of expressing combinations, then ambiguity is introduced, one code gets mapped to many different combinations. Amibuity leads to suboptimal or undersired outcomes - for example hallucinations. Determinism can be out of reach.
While many want much better outcomes than LLMs provide today, whether in (LLM) agent form or not, those same dissatisfied people still acknowledge how much models have improved - improvements that are rooted in many things, including scaling laws. We get more, and it wets our appetite for even more.
The rest of this text expands on disambiguation with a simple analogy, that has flaws, but is for me, a starting point to explaining to those that do not read research papers, one of the reasons LLMs have improved.