ChatGPT is OpenAI’s game-changing AI chatbot that’s keeping the internet amazed. Against all established tech trends, it hasn’t taken long for ChatGPT to find its way into almost every area of our digital life.
Very few tech innovations have garnered as much interest as ChatGPT has achieved in such a short time. It never seems to run out of cool tricks—every day, we learn about exciting new things we didn’t know it could do.
But how is ChatGPT able to do the things it can do? How does ChatGPT work?
How Was ChatGPT Built?
To understand how ChatGPT works, it’s worth looking at its origins and the brain behind the cutting-edge AI chatbot.
Firstly, as magical as ChatGPT may seem, it was built by the genius of humans, just like every worthwhile software technology out there. OpenAI created ChatGPT, the revolutionary AI research and development company behind other powerful AI tools like DALL-E, InstructGPT, and Codex. We’ve previously answered some questions you might have about ChatGPT, so do take a look.
While ChatGPT went viral towards the end of 2022, most of the underlying technology that powers ChatGPT has been around for much longer, although with much less publicity. The ChatGPT model is built on top of GPT-3 (or, more specifically, GPT-3.5). GPT stands for “Generative Pre-trained Transformer 3.”
GPT-3 is the third iteration of the GPT line of AI models and was preceded by GPT-2 and GPT. Earlier iterations of the GPT models are equally useful, but GPT-3 and the finely-tuned GPT-3.5 iteration are much more powerful. Most of what ChatGPT can do is due to the underlying GPT-3 technology.
What Is GPT?
So we’ve established that ChatGPT is built on the third generation of the GPT model. But what’s GPT anyway?
Let’s start by unpacking the acronyms in an easy-to-digest and non-technical manner.
- The “Generative” in GPT represents its ability to generate natural human language text.
- The “Pre-trained” represents the fact that the model has already been trained on some finite dataset. Much like you’d read a book or maybe several books before being asked to answer questions about it.
- The “Transformer” represents the underlying machine-learning architecture that powers GPT.
Now, putting it all together, Generative Pre-trained Transformer (GPT) is a language model that has been trained using data from the internet with the aim of generating human language text when presented with a prompt. So, we’ve repeatedly said that GPT was trained, but how was it trained?
How Was ChatGPT Trained?
ChatGPT itself was not trained from the ground up. Instead, it is a fine-tuned version of GPT-3.5, which itself is a fine-tuned version of GPT-3. The GPT-3 model was trained with a massive amount of data collected from the internet. Think of Wikipedia, Twitter, and Reddit—it was fed data and human text scraped from all corners of the internet.
If you’re wondering how GPT training works, GPT-3 was trained using a combination of supervised learning and Reinforcement Learning through Human Feedback (RLHF). Supervised learning is the stage where the model is trained on a large dataset of text scraped from the internet. The reinforcement learning stage is where it is trained to produce better responses that align with what humans would accept as being both human-like and correct.
Training With Supervised Learning
To better understand how supervised and reinforcement learning applies to ChatGPT, imagine a scenario where a student is being taught to write an essay by a teacher. Supervised learning would be the equivalent of the teacher giving the student hundreds of essays to read. The goal here is for the student to learn how an essay should be written by getting used to the tone, vocabulary, and structure of hundreds of essays.
However, there will be good and bad among those hundreds of essays. Since the student was trained on both good and bad copies, sometimes, the student might write a bad essay because the student was also fed bad essays at some point. This means when asked to write an essay, the student might write a copy that’s not acceptable or good enough for the teacher. This is where reinforcement learning comes in.
Training With Reinforcement Learning
Once the teacher establishes that the student understands the general rules of essay writing by reading hundreds of essays, the teacher would then give the student frequent essay writing homework. Subsequently, the teacher would provide feedback on the essay writing homework, telling the students what they did well and what they could improve. The student uses the feedback to guide subsequent essay writing homework, helping the student to improve over time.
This is similar to the reinforcement learning stage of training the GPT model. After being fed a massive amount of text scraped from the internet, the model can answer questions. However, its accuracy is not going to be good enough. Human trainers ask the model a question and provide feedback on which answer is more appropriate for each question.
The model uses feedback to improve its ability to answer questions more accurately and more like how a human would respond. This is how ChatGPT can generate human-sounding responses that are both coherent, engaging, and generally accurate.
How Is ChatGPT Able to Answer Questions?
So, you visit the ChatGPT website and sign in. You prompt ChatGPT: “write a rap song in the style of Snoop Dogg.” It responds with lyrics to a rap song that looks strikingly similar to what Snoop Dogg would write. How is this possible?
Well, the “magic” behind ChatGPT all ties neatly into its training.
After covering every inch of your Physics 101 textbook, there’s a good chance you’ll be able to answer any question from it that’s thrown at you. Why? Because you’ve read it, and you’ve learned it. It’s the same thing with ChatGPT—it learns. And as human civilization has shown, with enough training, solving almost any problem is possible.
While you can probably manage hundreds of books in your lifetime, ChatGPT or GPT has already consumed a huge chunk of the internet. That’s a massive wealth of information. In there, somewhere, are probably lyrics to Snoop Dogg’s numerous songs. So, of course, ChatGPT must have consumed it (remember, it’s pre-trained) and recognized patterns in Snoop Dogg’s lyrics. It would then use a “knowledge” of this pattern to “predict” lyrics to a song akin to what Snoop Dogg would write.
The emphasis here is on “predict.” ChatGPT doesn’t answer questions the same way we do as humans. For example, when faced with a question like, “What is the capital of Portugal?” you could say Lisbon and say it for a “fact.” However, ChatGPT doesn’t answer questions with 100% certainty. Instead, it tries to predict the right answer given the data it has consumed in its training dataset.
ChatGPT’s Approach to Answering Questions
To better understand the concept of predicting responses, imagine ChatGPT to be a detective tasked with solving a murder. The detective is presented with evidence, but they don’t know who committed the murder and how it happened. However, with enough evidence, the detective can “predict” with great accuracy who is responsible for the murder and how the crime was committed.
After consuming data from the internet, ChatGPT discards the original data and stores neural connections or patterns it has learned from the data. These connections or patterns are like pieces of evidence that ChatGPT analyzes when it attempts to respond to any prompt.
So, in theory, ChatGPT is like a very good detective. It doesn’t know for sure what the facts of an answer should be, but it tries, with impressive accuracy, to predict a logical sequence of human language text that would most appropriately answer the question. This is how you get answers to your questions.
And this is also why some of those answers look very convincing but are awfully wrong.
ChatGPT: Answers Like a Human, Thinks Like a Machine
The underlying technical details of ChatGPT are complex. However, from a rudimentary standpoint, it works by learning and reproducing what it has learned when prompted, just like we do as humans.
As ChatGPT evolves through research, the way it works might change. However, its foundational working principles will remain the same for a while, at least until a disruptive new technology comes along.