AI's Quest for World Domination Halted by... Math Problems?

Key Takeaways

OpenAI's ChatGPT is experiencing a decline in performance, struggling with accuracy over time
ChatGPT-4's prime number identification accuracy dropped significantly, while ChatGPT-3.5 showed improvement. Code generation and handling sensitive questions were also affected
Ongoing oversight of AI model quality is crucial, with a suggested monitoring framework for consistent quality. OpenAI aims to address risks of superintelligent AI systems

20-07-2023 By: Rohit Tripathi

AI Chatbot ChatGPT's Dwindling Performance

Despite the thrilling advancements in artificial intelligence (AI) powered chatbots, it seems that OpenAI’s ChatGPT, much revered in its field, is seeing an unexpected downswing in its performance. The reason behind this deterioration has left Stanford and UC Berkeley researchers scratching their heads.

A detailed study published on July 18 found that the newly upgraded models of ChatGPT were losing their edge over time, struggling to provide accurate responses to an identical set of questions over a span of a few months.

Researchers Lingjiao Chen, Matei Zaharia, and James Zou undertook rigorous tests on two models - ChatGPT-3.5 and ChatGPT-4. The AI was evaluated on varied parameters like mathematical problem-solving, generating lines of fresh code, and handling sensitive prompts, among others.

In an interesting twist, the study found that GPT-4, initially boasting a 97.6% accuracy rate in prime number identification in March, saw a massive drop to a measly 2.4% by June. Astonishingly, its predecessor, GPT-3.5, showcased improvement in the same task during this period.

There was a noted decline in both models' abilities when it came to generating novel code lines between March and June. Additionally, their handling of sensitive questions underwent a transformation. The bots, which earlier elaborated on their inability to answer certain sensitive queries related to ethnicity and gender, adopted a curt approach by June, merely apologizing and refusing to entertain such questions.

The researchers highlighted that "The behavior of the 'same' [large language model] service can change significantly within a relatively brief period." They stressed the urgency for continuous oversight of AI model quality.

For those who extensively utilize these LLM services, whether individuals or corporations, the researchers suggested a constant monitoring framework to ensure quality consistency.

In a related development, OpenAI announced its intent to assemble a dedicated team on June 6 to curb the risks associated with a potentially superintelligent AI system, which it anticipates will materialize within this decade.

AI technologies are akin to a roller coaster ride, with thrilling peaks and surprising lows. As AI models continue to evolve, issues like these present critical opportunities to understand, address and build even more reliable systems. It's all part of the ride.

Also, read - Major Tether Laundering Ring Busted in China, 21 Arrested

WHAT'S YOUR OPINION?