(Refiles to fix formatting, no change to story content)

By Krystal Hu and Anna Tong

(Reuters) – Artificial intelligence companies like OpenAI are seeking to overcome delays and unexpected challenges in finding ever-larger language models by developing training techniques that use more human-like ways of “thinking” for algorithms .

A dozen AI scientists, researchers and investors told Reuters they believe the techniques, which are behind OpenAI’s recently released o1 model, could reshape the AI ​​arms race and have implications for the types of resources AI companies have insatiable demand for, from energy to types of chips.

OpenAI declined to comment for this story. After the release of the viral chatbot ChatGPT two years ago, tech companies, whose valuations have benefited greatly from the AI ​​boom, publicly argued that “augmenting” current models by adding more data and power of computation would systematically lead to improved AI models.

But today, some of the most prominent AI scientists are speaking out about the limits of this “bigger is better” philosophy.

Ilya Sutskever, co-founder of AI labs Safe Superintelligence (SSI) and OpenAI, recently told Reuters that this is a result of intensifying pre-training – the phase of training an AI model that uses a large amount of unlabeled data to understand linguistic patterns and structures. – have reached a plateau.

Sutskever is widely recognized as an early advocate for making massive strides in the advancement of generative AI through the use of more data and computing power during pre-training, which has eventually gave birth to ChatGPT. Sutskever left OpenAI earlier this year to found SSI.

“The 2010s were the era of scale, now we are back in the age of wonder and discovery. Everyone is looking for the next thing,” Sutskever said. “It’s more important than ever to scale the right thing. »

Sutskever declined to share more details about how his team is approaching the problem, saying only that SSI is working on an alternative approach to ramping up pre-training.

Behind the scenes, researchers at leading AI labs have experienced delays and disappointing results in the race to release a large language model that outperforms OpenAI’s nearly two-year-old GPT-4 model. according to three sources close to private affairs.

So-called “training cycles” for large models can cost tens of millions of dollars by running hundreds of chips simultaneously. They are more susceptible to hardware-related failures given the complexity of the system; Researchers may not know how the models will perform until after the trial is complete, which can take months.