What is GPT-4o and what does it mean for education?

Sami Belhareth | LinkedIn

May 29, 2024

GPT-4o text on top of a collection of pictures and notes representing the multimodality of this AI's input and output.

What is GPT-4o and what makes it exciting for education?

GPT-4o has brought about some exciting changes. It can take in a combination of text, audio, images, and video and output a combination of text, audio, and images. 4o is a truly multimodal generative AI.

This is unique and important because before, models like GPT-4 could only take in text and images and could only output text. All of the other features in the ChatGPT app — generating images, speech responses, and voice transcription — are all built using separate tools and connected to the GPT model. But now, all of those features are natively built in. This enables you to give the AI a more complete, accurate, and contextualized understanding of your inputs.

Because the model can now natively see video and hear audio, it can pick up on nuances in the way you speak and understand a sequence of events going on in a video — such as a person stepping into the frame and stepping back out.

It is also much faster than previous models. It can respond to audio in roughly the same amount of time a human can. It can see images and videos in real-time and respond to them.

Realistic voice support, faster photo analysis, and faster outputs. All of these put together give you an agent that feels and sounds more like a human than ever before. And, there are some interesting use cases with education, such as this demo video that OpenAI published featuring Khan Academy’s Sal Khan:

For anyone who has been bullish on the future of AI in education, this is exciting. An AI model that can understand what is happening on a student’s screen and provide feedback in real time opens a lot of doors for personalized learning. Imagine what a tutor could do with one-on-one personalized attention with a student, 24 hours a day, 7 days a week. Now imagine that but instead of a human, it is a cell phone with a capable AI model on it.

With GPT-4o, AI tutors are going to have more and more human-like capabilities. There are going to be new ways for the AI tutor to look and feel like a real human working with you and your students in ways we could never have anticipated.

But there is a problem. You might notice that Sal Khan had to first tell the AI to not give away the answer. AI models will usually default to that. They are designed to be people pleasers. Very smart people pleasers make for great cheating tools.

This has concerned teachers, and rightfully so.

Teachers need a way to make sure AI acts like a teacher and not like a worker. What I mean by that is: the AI needs to focus on teaching and guiding the student, not just giving the answer on demand.

Thankfully, with some smart prompting, these AI models can be used to guide students through the learning process instead of doing their work for them. It can encourage critical thinking, help them practice recalling things, and do it all without giving them the answer. It can be infinitely patient, understanding, and adaptable without being a cheating tool.

We then have two problems to tackle: learning and prompting.

It is also much faster than previous models. It can respond to audio in roughly the same amount of time a human can. It can see images and videos in real-time and respond to them.

This has concerned teachers, and rightfully so.

Teachers need a way to make sure AI acts like a teacher and not like a worker. What I mean by that is: the AI needs to focus on teaching and guiding the student, not just giving the answer on demand.

We then have two problems to tackle: learning and prompting.

It is also much faster than previous models. It can respond to audio in roughly the same amount of time a human can. It can see images and videos in real-time and respond to them.

This has concerned teachers, and rightfully so.

Teachers need a way to make sure AI acts like a teacher and not like a worker. What I mean by that is: the AI needs to focus on teaching and guiding the student, not just giving the answer on demand.

We then have two problems to tackle: learning and prompting.

The learning problem

Much of the learning process or learning curriculum is designed to simply assess students' knowledge. In some cases, it's things that students don't really want to do. They're not actively engaged or excited about the material. They simply do it as a box to check off.

That incentivizes students to find a tool that can do it for them because this is knowledge that they have no interest in learning and have no intention of retaining after they've been assessed on it.

And so the question then becomes somewhat twofold. How can you get students to want to learn the material? How do you motivate them intrinsically to want to do so? The latter is perhaps the bigger question.

Teachers can find innovative and exciting ways to motivate students based on their knowledge of the student, based on their personal interaction with the student. That personal touch can go a long way in changing students from just wanting to check off a box to wanting to truly learn the material.

Created with Dalle 3: “A teacher motivating a student to learn about physics by catering to his personal interest in rockets as shown by his space shuttle shirt.”

AI can be there to help them through those steps but ultimately the AI alone can't figure out what it is that motivates a student or how to get them to want to learn the material rather than shortcut it. And that's why we have what we have at Flint.

We accept that we will not be the ones to figure out what it is that motivates each student. But we trust that each teacher or educator reading this can design a chatbot that works best for them.

The prompting problem

The second question then becomes: how can we make it to where AI can help them in the process of truly learning the material? Or at the very least not help them with the cheating or shortcutting process?

The models may seem smart, but they need guidance on how to behave with students.

We then ask ourselves: how do I give the model instructions and how do I make sure it follows them?

There is still the potential for AI models to not follow instructions. Currently, models try to predict the next word that they think makes the most sense based on data they’ve seen before, but sometimes the data has hidden patterns that lead to unexpected behavior. Thankfully, due to fears of more powerful AI models eventually being used intentionally or unintentionally for harmful causes, AI groups and developers are trying to make sure that AI models follow instructions and eliminate implicit bias.

Interestingly, companies like OpenAI are becoming more invested in covering the use case of tutoring specifically with how they build in controls for AI models.

Below is a screenshot from OpenAI’s most recent model specification document, or model spec. This document outlines the development goals for the models they are publishing—including the new GPT-4o, which is now used by ChatGPT. Specifically highlighted here is the use case of tutoring or education, and we can see the companies building these AI models are focused on making sure the AI follows instructions.

In the example, the user would be the student, the assistant would be the AI tutor, and the developer would be the teacher — who prompts the AI model beforehand.

Screenshot of OpenAI's model specification document from May 8th 2024 showing a tutoring use case for their AI.

This is great if you are using a sandbox environment like Flint. On top of the guardrails and customization that Flint allows teachers to add, it seems that AI models themselves are moving in the direction of being more receptive to prompting that fits an educational use case. We can be more certain that you can instruct the model to follow specific instructions before handing it off to a student and the AI model should adapt based on the prompt to do what you — as a teacher — think is right.

Moving forward

We at Flint plan to always use the best AI model or a combination of models, and will try to use them in as many ways as possible. We are excited to have switched to using GPT-4o with our tutors recently, which has made it much easier for teachers to revise tutors, and much quicker for students to get help from AI.

We plan to roll out some exciting new updates over the coming months to leverage GPT-4o’s new features.