Thursday , 13 June 2024
Home AI: Technology, News & Trends The Global AI Race Intensifies: Major Players Unveil Next-Gen Models and Strategies

The Global AI Race Intensifies: Major Players Unveil Next-Gen Models and Strategies


A game where failure to enter means elimination has begun worldwide.

At the end of 2022, the somewhat awkwardly named ChatGPT ignited a global trend in conversational AI, with news of major players such as Microsoft, Amazon, and Google entering the fray, sparking waves in the tech field like stones thrown into water.

Microsoft, as the investor behind ChatGPT’s parent company, was the first to place heavy bets, announcing this week the integration of ChatGPT capabilities across all product lines. And as the discourse of ChatGPT replacing search engines gained momentum, Google couldn’t sit idle, announcing on February 7th the launch of its own conversational robot “Bard.”

On the same day, Baidu announced the launch of “Wenxin Yiyuan,” a ChatGPT-like product based on its own Wenxin large model. Later that afternoon, the second-place player in the domestic search engine market, 360, followed suit, revealing its internal use of such products and plans to release a ChatGPT-like demo product soon. Two hours after the news was released, its stock price hit the limit up.

Fresh news keeps coming. In the early hours of February 8th Beijing time, Microsoft announced at a media briefing the opening of ChatGPT-supported search engine Bing.

Behind the flurry of official announcements, it’s easy to see that almost every major player chasing ChatGPT is mentioning the concept of “large models.”

In a short official announcement, Baidu spent a paragraph introducing its own AI four-layer architecture layout, focusing on the Wenxin large model. Google CEO Sundar Pichai also stated that their AI conversational robot “Bard” is supported by the large model LaMDA.

ChatGPT and large models are two sides of the same coin. On the surface, ChatGPT is a conversational robot with abilities such as chatting, consulting, and writing poetry and essays. But fundamentally, it is an application based on AI large models—if it weren’t for the capabilities supported by large models, today’s globally booming ChatGPT might not have come into being.

Behind the Rise: ChatGPT’s Pandora’s Box Opened by Large Models

The ability of ChatGPT to achieve the current “knowing the heavens and knowing the geography” effect relies on large models born from massive data—large models enable it to understand and use human language and engage in nearly real conversations and interactions.

Massive data is the foundation of large models. As the name suggests, this is a type of model with billions of parameters generated through knowledge extraction and learning from billions of corpora or images. ChatGPT, a product of the OpenAI GPT-3 model after adjustments, has 175 billion parameters.

This brings about unimaginable breakthroughs—based on large amounts of text data (including web pages, books, news, etc.), ChatGPT has gained the ability to answer questions on different topics. Coupled with the diversity of learning methods, ChatGPT can provide divergent answers to questions.

Large models are not a new phenomenon and were already being discussed in the industry around 2015. However, behind the emergence of large models lies a revolution in the landing mode of artificial intelligence.

As one of the most important components of artificial intelligence, the landing of machine learning has long relied on data models. It requires a large amount of data to train models so that computer systems can learn from the data.

In summary, the larger the amount of the latest data, the more foundations machine learning has to learn, and the higher the possibility of achieving more accurate and intelligent results.

This also means that in the past when the amount of data was insufficient, the development of machine learning would be hindered. With the advancement of PCs and mobile internet, the foundation of machine learning—data volume—has also grown exponentially. One phenomenon arising from this is that from 1950 to 2018, the number of model parameters increased by seven orders of magnitude. In the four years after 2018, the number of model parameters increased by five orders of magnitude, reaching over a hundred billion.

In other words, when there is sufficient data, machine learning has the potential for further upgrades, a possibility that existed as early as 2018.

However, data alone is not enough; with data usage comes increasing costs— the larger the amount of data used in the machine learning process, the higher the cost of data labeling, data cleaning, manual tuning, etc. High-quality labeled data is difficult to obtain, which reduces the cost-effectiveness of the entire process.

To address this issue, the landing mode of machine learning has also changed.

Today, machine learning is mainly divided into three learning methods: supervised learning, unsupervised learning, and semi-supervised learning. Large models are closely related to unsupervised learning and semi-supervised learning.

Previously, the mainstream method of building machine learning was supervised learning. This involves collecting data first and then feeding the model a set of input and output combinations that have been verified by humans through strong manual intervention/supervision, allowing the model to learn through imitation.

“In the labeling and cleaning stages, I input a set of data to the machine and provide feedback on whether the learning results are correct or incorrect, allowing it to find the correlation between parameters and optimize them,” said a product manager who has been involved in algorithm optimization.

Unsupervised learning does not require labeling, and in the training data, only input is given without human-provided correct output, with the aim of allowing the model to learn the relationship between data openly.

Semi-supervised learning is somewhere between the two. In this learning method, the model attempts to extract information from unlabeled data to improve its predictive ability and also uses labeled data to verify its predictions.

In other words, compared to traditional supervised learning, unsupervised learning and semi-supervised learning save more costs and reduce the dependence on high-quality labeled data.

“If there was no unsupervised learning, large models might not have been trainable,” said an AI expert who straddles the academic and commercial worlds not long ago.

Of course, ChatGPT’s ability to emerge and reduce data processing costs is not the most important thing.

In the supervised learning mode, the data “trained” by humans often comes from datasets belonging to specific domains and are not large in overall quantity. This can lead to difficulties in adapting when a model from one domain needs to be applied to another domain, known as “poor model generalization.”

For example, a model that performs well on a question-and-answer dataset may produce unsatisfactory results when applied to reading comprehension.

The emergence of large models can relatively address the problem of “poor generalization,” making them more versatile.

This is also because large models are trained based on publicly available massive data on the Internet, without relying on specific small data sets as a basis. This approach is more likely to train a set of general basic models applicable to multiple scenarios—which is also an important reason why ChatGPT can answer various questions.

In conclusion, the landing of large models is a milestone in machine learning and also the key to unlocking the Pandora’s box of ChatGPT.

The GPT Series: “Self-Revolution” of Landing Large Models


Looking back at the iterations of ChatGPT, one can see a history of self-upgrades for large models. In this process, OpenAI underwent at least three “self-revolutions” in terms of technical routes.

As mentioned earlier, ChatGPT is based on OpenAI’s third-generation large model, GPT-3, and was born from fine-tuning on GPT-3.5.

From the names, it can be seen that OpenAI had previously released GPT-1, GPT-2, and GPT-3. The landing methods of these generations of GPTs were not the same.

The first-generation generative pre-training model, GPT-1, was launched in 2018. GPT-1 learned through semi-supervised learning, which means it first learned from a large amount of unlabeled data through unsupervised learning on 8 GPUs over the course of a month and then underwent supervised fine-tuning.

The advantage of this approach is that only fine-tuning is needed to enhance the model’s capabilities, reducing the demand for resources and labeled data.

However, the problem was that GPT-1 had limited data compared to today’s models with parameters in the hundreds of billions, seeming meager with over 100 million parameters at the time. This meant that the overall understanding of the world by the GPT-1 model was not comprehensive or accurate enough, and its generalization was still lacking, resulting in poor performance in some tasks.

A year after the launch of GPT-1, GPT-2 was officially unveiled. This generation of GPTs had no difference in underlying architecture from its “predecessors,” but in terms of datasets, GPT-2 had 40GB of text data, 8 million documents, and a significantly increased parameter count of 1.5 billion.

Research has shown that GPT-2, with its explosion of parameters, generated text that was almost as convincing as articles from The New York Times. This made more people realize the value of large models under unsupervised learning.

With an annual update frequency, GPT-3 arrived as expected in 2020. This version of GPT-3 reached 175 billion parameters in model parameters and included more diverse thematic texts. Compared to GPT-2, this new version was capable of completing tasks such as answering questions, writing papers, summarizing texts, translating languages, and generating computer code.

It’s worth noting that at this point, GPT-3 still followed the path of unsupervised learning and large parameter amounts. However, by 2022, there was a significant change.

That year, based on GPT-3, OpenAI introduced InstructGPT. OpenAI stated that InstructGPT is a fine-tuned version of GPT-3 that reduces harmful, untrue, and biased inputs. Apart from the difference in training model data volume, there was little difference between ChatGPT and InstructGPT.

The question arises: Why can InstructGPT and ChatGPT further enhance intelligence and optimize people’s interactive experience?

The reason behind this is that OpenAI, in the models released in 2022, began to value human-annotated data and reinforcement learning—specifically reinforcement learning from human feedback (RLHF). According to reports, OpenAI used a small amount of manually labeled data to construct a reward model this time.

At first glance, the characteristic of large models under unsupervised learning is their large data volume and reduction of dependence on data labeling and manual input—this is the core of GPT-2 and GPT-3.

However, the route taken by InstructGPT and ChatGPT seems to temporarily return to the human-centered approach.

This change seems drastic, but it’s actually an adjustment to make AI products more user-friendly. Breaking down the logic behind it, the training of ChatGPT relies on the foundation of GPT-3.5 large models, but the introduction of human-annotated data and reinforcement learning allows the large models to better understand the meaning of information and make self-judgments—bringing them closer to the ideal performance of artificial intelligence.

In other words, in the past, under unsupervised learning, the model was given input without providing correct output, allowing it to “freely develop” based on a large amount of data, possessing the basic qualities of artificial intelligence.

However, at this point, incorporating human feedback on the learning results of large models helps the model better understand the information of the input itself and the information of its own output, making it more user-friendly. In specific scenarios, ChatGPT enhanced by human feedback can improve its ability to understand user query intents (input) and the quality of its own answers (output).

To achieve better results, information shows that OpenAI has hired 40 doctoral students to work on human feedback.

The seemingly contradictory approach of abandoning and then returning to human labor in artificial intelligence has also been recognized by many industry professionals.

For example, He Xiaodong, vice president of JD Group, recently told the media that compared to the extensive use of unsupervised deep learning algorithms in the past, the algorithms and training processes behind the ChatGPT model are more innovative. Without human data selection, even if the model parameters are increased tenfold, it would be difficult to achieve today’s results.

“In a sense, this is actually a correction of the past blind pursuit of large (parameters) and unsupervised learning.”

Of course, even with a renewed emphasis on human feedback, it does not mean that OpenAI has completely abandoned its previous principles. Analysis indicates that ChatGPT’s training is mainly divided into three stages, with human feedback being very important in the first two stages, but in the final stage, ChatGPT only needs to learn from the feedback model trained in the second stage without strong human involvement.

Whether it’s GPT-1, 2, 3, InstructGPT, or ChatGPT, the five-year journey of model iteration by OpenAI seems to be a self-reform as well.

This also indicates that pushing a certain type of technology to its extreme is not a matter of principle for this company—whether it’s unsupervised learning, self-supervised learning, or semi-supervised learning, it’s never about creating large models for the sake of it but rather about making AI smarter.

Big Companies Harvesting Large Models, but “Model Refinement” Isn’t the End Goal

Even as the capabilities of large models take center stage with the rise of ChatGPT, the controversies in the industry are still apparent.

From a business model perspective, as large models become more universal, many more companies can rely on the foundation of large models and make slight adjustments more tailored to their own business attributes. The theoretical advantage of doing so is that many companies can save a lot of costs in training models, while companies that release large models can charge fees for accessing these models.

However, this approach is currently being criticized by some industry insiders.

The founder of an AIGC company told Latest that this issue is not just about money and costs; the key is that there are many limitations to calling third-party large models, which will affect their own business.

“For example, it’s difficult to optimize for some application scenarios.” He cited an example: if you want to create a comprehensive demand for voice simulation and image simulation, the model provider needs to provide comprehensive capabilities. If one technical aspect is not in place, it will result in poor product performance.

Apart from criticisms, Some companies in the industry are trying to reduce the cost of landing large models through algorithm optimization.

However, fundamentally, large models undoubtedly are a business that naturally suits giants—the cost investment alone shows this.

Breaking down the process, building a large model requires sufficient data processing, computing, and networking capabilities.

Taking the upstream data processing process as an example, unsupervised learning can solve some of the costs of data labeling, but the previous costs of data collection and data cleaning are still difficult to reduce. Moreover, these tasks often require human intervention and are difficult to fully automate.

Looking at computing and networking, the training tasks of large models often require hundreds or even thousands of GPU cards’ computing power. This means that in addition to computing power, when there are many server nodes and a large demand for inter-server communication, network bandwidth performance also becomes a bottleneck for GPU clusters, and high-performance computing networks become a topic of discussion.

Concrete numbers are more convincing. It has been reported that Stability AI previously spent about $20 million on computation alone. Even if only using large models for fine-tuning and inference, thousands of gigabytes of memory are still required locally. If a company wants to deploy large models to the production line, it would require about 70 personnel to start from scratch. In Western countries, just supporting 70 employees would cost around $20 million.

Large companies are not shy about the high price tags for large models. At the end of last year, the head of the data department of a major Chinese internet company bluntly stated that, in his view, it would be highly uneconomical and irrational for medium-sized companies to replicate the path of large models. He further stated that even for this company, which has a market value exceeding tens of billions of dollars, the original intention of developing large models was to serve its own internal business needs—namely, to provide unified support for various business departments requiring AI capabilities and to avoid multiple internal conflicts.

Therefore, it is relatively reasonable for large companies to harvest underlying large models, while small and medium-sized companies select large models that are more suitable for their own business characteristics and build industry applications based on them. In other words, the AI field may see a reenactment of the cloud computing landscape in the world.

However, regardless of the country of origin, large models are simultaneously facing a soul-searching question—when the amount of data continues to increase exponentially and the underlying computing power cannot keep up, will the path of large models still be viable?

Perhaps for practitioners, the rise of ChatGPT this time is just a surface phenomenon, and the deeper insight lies in seeing OpenAI’s “self-iteration” in landing large models.

After all, this company has shown everyone through at least five years of self-struggle that simply “refining” large models is not the goal; making AI truly usable and useful is the ultimate chapter.

Related Articles

AI deception

Beware of the Deceptive Evolution of Artificial Intelligence

An article in the field of artificial intelligence (AI) has caused a...


GPT-4o: OpenAI’s Super Gateway, Challenging Google?

Based on ChatGPT or GPT-4o, the way humans obtain information may likely...

What is openai's new product

Speculation: Multimodal AI Assistant, Google vs. OpenAI: What’s Behind the Mystery Product?

Google’s and OpenAI’s mystery new product, slated to be revealed just a...

DeepMind Alphafold 3

DeepMind Unveils AlphaFold 3: A Breakthrough in Protein Structure Prediction for Enhanced Drug Development

On Wednesday, Google’s DeepMind unveiled the AlphaFold 3, a new model for...