The technology behind GenAI is technically very complicated; however, a few simplified analogies can offer insight into how these algorithms work.
Analogy: GenAI as a Word Predictor
At the core, GenAI is simply generating new content based on its prediction of what word, pixel, or audio wave should come next in the sentence, image, or audio file. Imagine presenting in front of 5 random people. You ask them to fill in the following blanks:
"Knock, knock, who ______."
Responses: "is there?" x5.
Comment: Everyone agrees because they all have the same basic experience of hearing this prompt and response many times before.
"The capital of Canada is _______?"
Responses: "Ottawa"x2 "Toronto"x1 "Vancouver"x1 "Do not know"x1
Comment: Here, answers are split because only 2 out of 5 people know that the capital is Ottawa, while the remainder either guess a Canadian city with a greater population or simply do not know the answer.
"_____ make the best pets."
Responses: "Dogs" x3 "Cats"x1 "Lizards"x1.
Comment: This question is an opinion and thus does not have a "correct" response. Nonetheless, the greatest response is predictable since it reflects a majority opinion. Notice, also, that because the question is open-ended, it can be answered by a specific type of pet (dog) or by a whole category of pets (lizards).
Just like human responses above, GenAI will attempt to complete the sentences and fill in the blanks based on the billions of data points in its repository. Just like humans, GenAI will reflect the more probable, common, or easily accessible information--not necessarily the correct answer. The reasons for this may be similar to incorrect information from people: misunderstanding or confusion about the facts, reflection of an opinion that cannot be independently verified, or answers with varying levels of generality.
Analogy: Sandwich Creating Algorithm
We have seen how GenAI acts like a sentence predictor. How does it know what word to predict next? How does it know the answers to basic questions? Machine learning is often discussed in the context of GenAI. It is technically a subset of AI that relies on computers learning from data and improving their performance on a task without being explicitly programmed. It works by using algorithms to identify patterns in data, make predictions, or take actions based on those patterns, refining its results through experience. Machine learning is one way that GenAI learns how to answer questions. There is a semantic debate about whether GenAI really learns anything or simply simulates a probabilistic response based on a complex algorithm with no real understanding of what it is doing. For our purposes, these two possibilities are two ways to explain the same phenomenon.
Imagine trying to teach a robot how to make sandwiches. At first, it has no clue—just piles random ingredients together: like pickles, chocolate, and mustard. It is a total disaster. But here’s the thing—every time the robot makes a sandwich, a human tester takes a bite and scores it. “Too salty.” “Slimy texture.” The robot takes that feedback and adjusts its weights—think of these as how much importance it gives to each ingredient. Maybe it starts realizing that mustard shouldn’t dominate 70% of the flavor profile. It might eventually realize that any sandwich with pickles probably should not also have chocolate.
Now apply this to generative AI. Instead of mustard and pickles, the AI is learning from language, images, or sounds. During training, it’s fed a giant dataset and tries to generate outputs—text, art, music. At first, it’s a mess. But it keeps tweaking its weights until it starts generating meaningful content.
Summary
We have learned that GenAI acts like a word predictor, calculating probabilities of the most likely next word given the words assembled so far and the billions of other sentences it has seen. We have also learned that one of the ways that the algorithm learns how to select its targets is through feedback by human designers for each of the algorithm's proposed responses. Next, we will take a look at the underlying technology and methods necessary to make GenAI work.
Recall that machine learning (ML) is a broad term to describe algorithms that perform a specific task better over time (with experience). Consider a spam filter that identifies each incoming email as either Spam or Not Spam. Such a filter could be created using ML by uploading thousands of examples of emails that have been labeled by human reviewers as either spam or not. Let's suppose that presence of the phrase "buy now" in an email is the best predictor of whether or not an email is spam. Such a model would operate as follows:
A baseline is a very basic model that works reasonably well and is very easy to test, like the one described above. The benefit of identifying a baseline is that you can empirically show how much better your final model is compared to the baseline. You can also evaluate whether incremental changes to a model have made it better or worse, depending on how each version performs compared to the baseline.
Datasets in ML
Suppose you are creating a ML algorithm and you have a total of 1,000 data points. How many of those data points should you use to build the algorithm? If you do not use enough, you risk the algorithm not being sophisticated enough to deal with difficult cases. But if you use too many, you will not have any way to test whether the algorithm is accurate.
Imagine you are training a model to predict whether it will rain given the humidity level. You have three data points:
100% -> rain;
80% -> rain;
60% -> no rain;
Suppose you ask for a rain prediction given 80% humidity. The model is very likely to predict rain. But we will not know if it is doing so because it already knows about a time where 80% humidity resulted in rain or whether it is making a prediction given all of the data. If you use all three data points in training, there is no way to know whether the model is actually predicting rainfall or simply regurgitating the only data points it has. By contrast, if you only trained the model on the 100% and 60% data points, then any prediction about 80% would have to be a prediction by the model and not direct recall. Since the purpose of ML is to be able to train a model that can offer predictive insight, ML can never be trained on all the available data.
One of the biggest threats to accurate ML assessment is determining whether the model has inadvertently learned the answers to the test (this is called data leakage). Thus, before training your model with any data, you must divide the dataset into at least two parts: training data and testing data. (For greater control over modeling, you may also reserve a third set of data called a validation set, which can be used to find the best of several competing models using specific designer parameters. However, this practice will not be discussed here.)
For medium size datasets, a rule of thumb is that 80% of the data should be used for training while 20% should be reserved for testing. As the amount of data increases (like over a million data points), the percent reserved for testing can decrease to 1 or 2%. Once you decide how much of your data to set aside for testing (say 20%), you have to make sure that your algorithm sets aside the same 20% of the data each time the algorithm is run. Otherwise, your model will eventually be exposed to all of the data and you will once again have no way to test the model's accuracy. There are two ways to ensure that the training and testing data do not get mixed up. First, you can manually remove the testing data from the dataset and isolate that data from the algorithm until testing. Second, you can use a seed function in your algorithm to ensure that the same 20% of the data is set aside from testing. Note, however, that these solutions only work if the dataset is static; otherwise, you will likely need a system to parse training from testing data.
Deep Learning is s specific subtype of machine learning that involves neural networks, where nodes are connected in layers serving the following functions:
Input Layer: Takes in the raw data, which in this case is the emails.
Hidden Layers: Process data through interconnected nodes that apply weights, biases, and activation functions. For example, these stages will consider how likely it is that an email is spam if it includes the phrase "buy now" or if it is from an unknown sender.
Output Layer: Produces the final result: predicts whether any given email is spam or not.
When ML was first conceived of, technological limitations made deep learning impracticable. Accordingly, any model having more than three hidden layers was considered "deep." Although this definition is still found in the literature, modern use can feature hundreds of layers. Today, machine learning and deep learning are sometimes used interchangeably--reflecting the reality that the difference is one of degree rather than kind.
Analogy: Neural Network Error Correction as an Experienced Baseball Pitcher
Suppose that you are practicing pitching a baseball. The first throw falls short of the catcher. The second throw goes over his head. How do you approach your third throw? Most people would instinctively aim to throw the ball further than the first time but not as far as the second time. This is, in effect, an automatic error correction to account for previous results.
Can an algorithm auto correct errors in the same way? It turns out that once an algorithm learns what the answer should be, it can use complex math to go back and try various inputs in order to get the desired output. In other words, the algorithm can auto correct mistakes (and train itself) in a way similar to a pitcher training herself. This process is called backpropagation. It involves calculating a gradient of loss function, which is a fancy way of saying that the algorithm determines which pitch is right under which circumstances by trial and error. This process is how neural networks auto correct themselves.