This week, four Microsoft engineers in their German division organized an event dedicated to the revolution that LLM (Large Language Models) like GPT plant in the company. As part of that conference, I will surprise you with details of the expected new version of the OpenAI model.
GPT-4. When GPT-3 appeared in 2020 it did so in the form of a private beta. That prevented that model from demonstrating its capacity, but in 2022 the appearance of ChatGPT – based on an iteration of GPT-3 – changed everything. It’s been months since we talked about what we hope for with GPT-4, and the CTO of Microsoft in Germany, Andreas Braun, said according to Heise Online that this engine will arrive next week.
Cosmos-1. The arrival of GPT-4 seemed especially imminent after Microsoft’s announcement at the beginning of March of the launch of Kosmos-1, a Multimodal Large Language Model (MLLM) that not only responds to text prompts, but also to images. This makes it behave like Google Lens in a certain way and is capable of extracting information and context from an image.
Bigger, better. One of the clear characteristics expected from GPT-4 is that it has a larger size than GPT-3. While it has 175,000 million parameters, it is said that GPT-4 will have 100 trillion parameters, something that Sam Altman, CEO of AI, explained that “is a complete stupidity”. Even so, what is certain is that it will be bigger, and that will allow it to be able to respond to more complex situations and generate even more “humane” responses.
Multimodal? This is one of the great innovations – if not the biggest – of GPT-4, a multimodal model that, as already described in Kosmos-1, will allow the input to be from diverse sources or “modalities” such as text – what if used in ChatGPT—, images, video, spoken voice or other formats.
Dadme datos, que ya los analysis yo. It will be models using deep learning and processing of natural language to understand the relationships and correlations between these different types of data. By combining multiple “modalities”, an artificial intelligence model can improve precision and provide complex data analysis.
An example: from the video. An immediate practical application of these models is from the video. With GPT-4, it is theoretically possible to input a video and its associated audio so that the engine can understand the conversation, including the emotions of those involved in it. You can also recognize objects (or people) and extract information. So, one could obtain a summary of a film or a YouTube video as we now obtain summaries of meetings.
Ahorrando tiempo. One of the Microsoft engineers indicated how this type of engine would be of great help in customer service centers, in which GPT-4 could transcribe the calls and then summarize them, something that human agents normally have to do. According to his estimates, this could save 500 hours of work a day for a Microsoft client in Holland who receives 30,000 calls a day: the prototype was created in two hours, a developer dedicated a couple of weeks to it, and the result was apparently a success
GPT-4 will continue to commit errors. Although the new model will undoubtedly be more powerful, Microsoft wanted to make it clear that the artificial intelligence will not always answer correctly and it will be necessary to validate the answers.
Just in case, we’ll be careful. The expectation with GPT-4 is enormous, and in fact even Sam Altman, CEO of OpenAI, made it clear a few weeks ago that the industry and users should lower their expectations because “people are crying out for disappointment, and that that’s what will happen”.
In Xataka | “I couldn’t go to sleep seeing that it grew so much”: we talked with the creator of Abbreviame, a viral bot based on ChatGPT