The good marks of GPT-4 in the exams are less impressive than what we have told

If GPT-4 were a student, it would be one of the most brilliant. The OpenAI itself evaluated its capacity with a series of exams that were created for human beings and in them they scored spectacular results. Conseguiría estar de hecho among the 10% of those who obtain better qualifications, but there are those who say that in reality that does not mean much.

What happened. OpenAI submitted GPT-4 to academic exams of various types, such as the Uniform Bar Exam, the most popular test in the US to become a lawyer, or the LSAT, the test that gives the possibility of accessing Columbia Law School. He also submitted to the GRE Quantitative test, which measures the ability to reason and understand mathematical concepts. In almost all of them, their score was exceptional, and that seemed to make GPT-4 superior to most human students. A recent study by two researchers reveals that there are problems with that perception.

Data contamination. To begin with, the researchers verified that GPT-4 knew memory responses… when its memory reached as far as there. It is known that the data with which the model was trained was from before September 2021. When it was submitted to programming questions before that date, it answered well, but could not answer any tests based on later tests included when the problems were simple.

If he qualifies this problem as “data contamination”, and even changing small details in the form of enunciation of the problem, he can confuse ChatGPT – which was a mediocre student – and probably GPT-4, pointing out that he would not have it in from the case of a human.

Exams Copy 2

There will be exams for humans, not for machines. “Memorization is a specter”, explained the authors. Although a model like GPT-4 does not have an exact problem in its training, “it is inevitable that we have seen quite similar examples, simply because of the size of the training corpus”. This allows the model to “use a much less deep level of reasoning”. To be experts, there will be linguistic models that do not have the reasoning capacity that humans need that are examined and then applied in the real world.

The comparisons are odious. Exams such as access to law “put too much emphasis on knowledge of the material and little on the skills of the real world, which are much more difficult to measure in a standardized way”. Or what is the same: these exams will not only emphasize the incorrect, but precisely “make too much emphasis precisely on what the linguistic models do well”. For the authors of the study, the choice of these tests to evaluate GPT-4 is “unfortunate”.

Quality, no quantity. For researchers, qualitative studies are needed, not quantitative. Although they recognize that GPT-4 “is really exciting and can solve many problems of professionals” like automating routine tasks, this type of evaluations with exams like those used for OpenAI can lead to confusion.

In Xataka | How to educate and prepare for a future in which robots do most of the work

In Xataka | Students no longer copy, use ChatGPT: universities are starting to watch out for the use of artificial intelligence

Latest articles

Domina el Desarrollo de Interfaces Gráficas de Usuario con el Course Gratuito en Python Flask y HTML

Learn to create a native desktop application using Python and HTML/CSS/JS. In this free Udemy course, Zenahr Barzani teaches how to create desktop...

How to Create Strategy Video Games in 2023

Share on social networksIn the exciting world of video games, strategy games have achieved outstanding...

Learn to lead projects successfully with this Free Course of Scrum, Agile and Project Delivery!

The free Udemy course, 'Basic Concepts of Scrum, Agile and Project Delivery', is given for SCRUMstudy Certification and offers an introduction to the world...

10 bad photos that, if you post them on Facebook, your account will be blocked

Social networks have become the main way of communication for many users with the whole world, not only with our closest environment. Sin...

Related articles


Please enter your comment!
Please enter your name here