Recent reports have raised concerns about the diminishing capabilities of ChatGPT, an AI chatbot developed by OpenAI. Users have been expressing dissatisfaction with the quality of responses generated by ChatGPT, suggesting a potential decline in its accuracy.
To address these concerns, Stanford University conducted a series of tests aimed at evaluating ChatGPT’s performance objectively. Their goal was to determine whether the claims of ChatGPT’s declining performance were substantiated or merely the result of rumors.
The tests were conducted using two language models: GPT 3.5 and GPT 4.0. These tests covered a range of tasks, including mathematical problem-solving, handling sensitive questions, generating software code, and visual reasoning.
The findings from this research indicate a notable inconsistency in ChatGPT’s performance across different tasks. Notably, GPT-4 exhibited a remarkable accuracy rate of 97.6% in March. However, just three months later, its accuracy plummeted to a mere 2.4%.
In contrast, the GPT-3.5 model showed signs of improvement, with its accuracy increasing from 7.4% to a more impressive 86.8% in the same set of tasks. This inconsistency in ChatGPT’s performance extends beyond specific tasks, affecting areas like coding and more.
Unfortunately, the exact cause of ChatGPT’s inconsistency remains elusive. OpenAI’s lack of transparency regarding the inner workings of ChatGPT further complicates efforts by researchers to pinpoint the root cause of this issue.
In summary, while ChatGPT’s capabilities appear to exhibit inconsistency rather than a clear decline, the opacity surrounding OpenAI’s approach to ChatGPT’s operation poses challenges in determining the precise source of the problem.