How The Ai Behind Chatgpt Actually Works
The advent of AI systems known as large language models LLMs, such as OpenAI's ChatGPT, is often seen as the beginning of a new technological era. These models could indeed have a significant impact on how we live and work in the future.
However, they didnt appear overnight and actually have a much longer history than most people realize. In fact, many of the techniques these systems rely on have been part of our everyday technology for years.
LLMs are a type of language model, which is a mathematical representation of language based on probabilities. If youve used predictive text on a mobile phone or asked a voice assistant a question, youve likely interacted with a language model. But what exactly do these models do, and how are they created?
Language models aim to predict the likelihood of a specific sequence of words occurring. This is where probabilities come in. For example, a strong language model for English would give a high probability to a well-formed sentence like the old black cat slept soundly and a low probability to a random word sequence such as library a or the quantum some.
Most language models can also generate plausible text by reversing this process. Predictive text on your smartphone, for instance, uses language models to predict how you might want to finish a sentence as you type.
The first approach to creating language models was introduced by Claude Shannon in 1951 while he was working at IBM. His method focused on n-grams - sequences of words like old black or cat slept soundly. The likelihood of n-grams appearing in text was estimated by finding examples in existing documents. These probabilities were then combined to compute the overall probability of longer word sequences, such as full sentences.
Neural NetworksEstimating probabilities for n-grams becomes increasingly difficult as the sequence length increases. For example, calculating probabilities for four-word sequences 4-grams is harder than for two-word sequences bi-grams. As a result, early language models often relied on shorter n-grams.
However, these models struggled to account for relationships between words that were farther apart in a sentence. This made it difficult to generate sentences where the beginning and end logically connected.
To solve this issue, researchers developed language models based on neural networks - AI systems inspired by the human brain. These models can capture relationships between words, even if they are not close together. Neural networks use large sets of numerical values known as parameters to represent these connections. For the model to function properly, these parameters must be set correctly.
The neural network learns the appropriate values for these parameters by analyzing a large number of example documents, much like how n-gram models learn probabilities by studying existing texts. During this training phase, the neural network looks at the training data and learns to predict the next word based on the words that have come before it.
While these models perform well, they also have limitations. Although neural networks can, in theory, recognize relationships between distant words, they tend to focus more on those that are closer together in practice.
Additionally, the words in training documents must be processed one by one to adjust the networks parameters. This sequential processing slows down the training process.