GenerativeAI

How It Works

Introductions and some Technical Stuff

Few of us possess actual understanding of how modern technology works at a schematic/mechanical level. Rather, while we never quite grasp how it works, we become comfortable once we know that it works.

Calculators. Computers. Elevators. Cars. Airplanes. Dishwashers. ATMs. Microwaves. TV. Internet. Mobile. Cloud. Email. Cameras. For most of us, our knowledge of how the technology underpinning our daily lives functions is de minimus.

New and surprising can be unsettling, especially if a breakthrough seems to presage some form of major disruption. It is natural to be curious about what is going on under the hood. While this is not the place to turn if you want to deeply understand parameters or loss functions, we have ourselves felt this compulsion to develop a sounder understanding of GenAI. What follows are resources we’ve consulted to raise our baseline comprehension in order to better identify signal in the noise.

For us, a necessary but not sufficient, point of departure is that the underlying models are probabilistic rather than deterministic. That is, when working with only the raw models—as was the case with the original ChatGPT—outputs are based on probabilities, not any source of truth.

A large language model, for example, is a model of language based on a large amount of data. This enables text prediction on steroids. As Stephen Wolfram opens his ur-explainer What is ChatGPT Doing…and Why Does It Work?, “It’s just adding one word at a time.

Or to adapt a passage from Professor Murray Shanahan*:  

Suppose we give an LLM the prompt “Who was the first person to walk on the Moon?”, and suppose it responds with “Neil Armstrong”.

What are we really asking here?

From our perspective, we are asking the literal question (“who was the first person to walk on the moon?”) in an attempt to elicit an accurate, factual answer (“Neil Armstrong”).

From the perspective of the model (which, again, is just a model), our prompt translates to:

Given the statistical distribution of words in the vast public corpus of (English) text, what words are most likely to follow the sequence “The first person to walk on the Moon was… ”?

That the model responds with “Neil Armstrong” is due to the regular co-occurrence of the two elements—i.e., the words “Neil Armstrong” are statistically mostly likely to follow the word sequence, “The first person to walk on the Moon was”).

The probabilistic nature of these models is what makes them so adaptable. We now have the architecture (transformers), compute power (Moore’s Law), and data volumes (the internet) to enable the machines to generate giant models that can be relatively quickly adapted to all sorts of different use cases. Before this moment, we were largely building massive IF/THEN deterministic workflows to achieve similar results—often not as good; almost always far more labor intensive.

On the other hand, the probabilistic nature of these models is also what makes them hallucinate (next section).

Be warned: any analysis that concludes with the observation that the models are probabilistic, prone to hallucination, and, therefore, not to be relied upon is materially incomplete. First, it assumes that perfect accuracy is the sole standard. In reality, perfection is rarely the standard and those who presume persistent human infallibility are deluding themselves. Second, it assumes end users will only be interacting with the models in their raw form, completely ignoring the nascent but rapidly maturing application layer.*

For more technical follows, we recommend Simon Willison and Andrew Ng.

MENU