📢 Gate Square #MBG Posting Challenge# is Live— Post for MBG Rewards!
Want a share of 1,000 MBG? Get involved now—show your insights and real participation to become an MBG promoter!
💰 20 top posts will each win 50 MBG!
How to Participate:
1️⃣ Research the MBG project
Share your in-depth views on MBG’s fundamentals, community governance, development goals, and tokenomics, etc.
2️⃣ Join and share your real experience
Take part in MBG activities (CandyDrop, Launchpool, or spot trading), and post your screenshots, earnings, or step-by-step tutorials. Content can include profits, beginner-friendl
The competition among AI large models is heating up: technical barriers drop, and business challenges become prominent.
AI Large Models: A Revolution Triggered by an Engineering Problem
Last month, the AI community sparked a "Animal Battle".
On one side is the Llama series of models launched by Meta, which is favored by developers due to its open-source nature. NEC Corporation of Japan quickly developed a Japanese version of ChatGPT based on the Llama paper and code, addressing Japan's technological bottleneck in the field of AI.
The other party is a large model named Falcon. In May of this year, Falcon-40B was released, surpassing Llama to top the "Open Source Large Language Model Ranking." This ranking was created by the open-source community Hugging Face to provide standards for evaluating LLM capabilities. Since then, Llama and Falcon have taken turns refreshing the rankings.
In early September, Falcon launched version 180B, topping the charts once again. Interestingly, the developers of Falcon are not a tech company, but rather the Technology Innovation Institute based in Abu Dhabi. UAE officials stated that their participation in the AI race aims to disrupt the existing landscape.
Today, the field of AI has entered a stage of a hundred schools of thought contending. As long as there are countries and enterprises with certain strength, they are trying to create their own version of ChatGPT. There are several players among the Gulf countries, and Saudi Arabia recently procured over 3,000 H100 chips for domestic universities to train LLM.
An investor lamented: "Back in the day, I looked down on the innovation of internet business models, thinking there were no barriers. I didn't expect that the hard technology big model entrepreneurship would still lead to a battle of hundreds of models..."
Why has what was originally considered a high-difficulty hard technology evolved into a "one country, one model" situation?
Transformer: The Engine of the AI Revolution
American startups, Chinese tech giants, and Middle Eastern oil tycoons are all investing in large model research, all stemming from a famous paper: "Attention Is All You Need."
In 2017, eight Google scientists publicly introduced the Transformer algorithm in this paper. This paper is currently the third most cited article in the history of AI, and the emergence of Transformer has triggered this round of AI frenzy.
Current various large models, including the globally sensational GPT series, are all built on the foundation of Transformer.
Previously, "teaching machines to read" has been a recognized challenge in academia. Unlike image recognition, humans understand context while reading. Early neural networks struggled with long texts, often resulting in issues such as translating "开水间" as "open water room."
In 2014, Google scientist Ilya first used Recurrent Neural Networks (RNN) to process natural language, significantly improving the performance of Google Translate. RNNs enable neural networks to understand context through a "recurrent design."
However, RNN has serious flaws: sequential computation leads to low efficiency and makes it difficult to handle a large number of parameters. Since 2015, Google scientist Ashish Vaswani and others have been working on developing alternatives to RNN, ultimately introducing the Transformer.
Compared to RNN, the Transformer has two major innovations: first, it uses positional encoding to achieve parallel computation, significantly improving training efficiency; second, it further enhances contextual understanding capabilities.
Transformer has solved multiple challenges in one go, gradually becoming the mainstream solution in the NLP field. It has transformed large models from theoretical research into pure engineering problems.
In 2019, OpenAI developed GPT-2 based on the Transformer. In response, Google quickly launched the more powerful Meena. Meena significantly surpassed GPT-2 in terms of parameters and computing power, without any innovative underlying algorithms. This left Transformer author Ashish Vaswani amazed at the power of "brute force stacking."
Since the advent of the Transformer, the pace of innovation in underlying algorithms in academia has slowed. Engineering elements such as data engineering, computing power scale, and model architecture have increasingly become key factors in the AI competition. Any company with a certain level of technical strength can develop large models.
AI expert Andrew Ng believes that AI has become a series of general technological tools, similar to electricity and the internet.
Although OpenAI remains the leader in LLMs, industry analysts believe that the advantages of GPT-4 primarily come from engineering solutions. Once open-sourced, competitors can quickly replicate it. It is expected that soon, other large tech companies will also be able to create large models that perform comparably to GPT-4.
Weak Moat
Today, the "Battle of Big Models" has become a reality. Reports show that as of July this year, the number of large models in China has reached 130, surpassing the 114 in the United States. Various myths and legends are no longer sufficient for domestic tech companies to name their products.
Apart from China and the United States, many developed countries have also initially achieved "one country, one model": Japan, the UAE, India, South Korea, and others have successively launched local large models. This scene seems to have returned to the era of the internet bubble, where "burning money" has become the main means of competition.
Transformers turn large models into pure engineering problems; as long as there are human and material resources, development is possible. However, entry is easy, but becoming a giant in the AI era is very difficult.
The "Animal Wars" mentioned earlier is a typical case: Falcon, despite ranking higher than Llama, has limited impact on Meta.
Open source research成果 of enterprises is not only about sharing the benefits of technology but also aims to mobilize social wisdom. As various sectors continuously use and improve Llama, Meta can apply the results to its own products.
For open-source large models, an active developer community is the core competitiveness.
Meta established its open-source policy as early as 2015 when it set up its AI laboratory. Zuckerberg is well aware of the importance of "maintaining good public relations." In October, Meta also launched the "AI Creator Incentive" program, funding developers who use Llama 2 to solve social issues.
Today, Meta's Llama series has become a benchmark for open-source LLMs. As of early October, 8 out of the top 10 on the Hugging Face leaderboard are developed based on Llama 2, with over 1500 LLMs utilizing its open-source protocol.
Improving performance is certainly important, but there is still a significant gap between most LLMs and GPT-4. In the latest AgentBench test, GPT-4 topped the list with a score of 4.41, while the second place, Claude, scored only 2.77, and most open-source LLMs hover around 1 point.
It has been more than half a year since the release of GPT-4, and peers around the world still find it difficult to catch up. This is due to OpenAI's top-notch team of scientists and their long-standing accumulated research experience in LLM.
It is evident that the core capability of large models lies in ecological construction ( open source ) or pure reasoning ability ( closed source ), rather than simply piling up parameters.
As the open-source community becomes more active, the performance of various LLMs may converge because everyone is using similar model architectures and datasets.
A more intuitive problem is that, apart from Midjourney, it seems that no other large model has been able to achieve profitability.
Value Anchor
In August of this year, an article titled "OpenAI may go bankrupt by the end of 2024" attracted attention. The main point is: OpenAI is burning through cash too quickly.
The article mentions that since the development of ChatGPT, OpenAI's losses have rapidly expanded, with a loss of approximately $540 million in 2022, relying solely on investments from Microsoft.
This reflects the dilemma commonly faced by large model providers: a serious imbalance between costs and revenues.
High costs mean that the main beneficiaries are currently chip manufacturers like Nvidia and Broadcom.
It is estimated that Nvidia sold more than 300,000 H100 AI chips in the second quarter of this year, weighing as much as 4.5 Boeing 747s. Nvidia's performance surged by 854%, shocking Wall Street. The second-hand price of the H100 has been speculated to reach $40,000 to $50,000, while the cost is only over $3,000.
The cost of computing power has become a hindrance to industry development. Sequoia Capital estimates that global tech companies will spend $200 billion annually on building large model infrastructure, but the annual revenue from large models is at most $75 billion, resulting in a gap of at least $125 billion.
With a few exceptions, most software companies have not found a profitable model after incurring huge costs. Even industry leaders like Microsoft and Adobe are facing challenges.
GitHub Copilot, developed in collaboration between Microsoft and OpenAI, charges $10 per month but costs Microsoft $20, with heavy users causing losses of up to $80 per month for the company. The newly launched Microsoft 365 Copilot is priced at $30, and the losses could be even greater.
After Adobe launched the Firefly AI tool, it quickly implemented a points system to limit excessive use by users that could lead to company losses. Once the monthly allocated points are exceeded, Adobe will reduce service speed.
Microsoft and Adobe already have clear business scenarios and a large number of paying users. In contrast, most large models with a mountain of parameters still mainly serve the chat application.
The emergence of OpenAI and ChatGPT has sparked this AI revolution, but the value of training large models at the current stage is in doubt. As competition intensifies and the number of open-source models increases, the space for pure large model providers may further shrink.
The success of the iPhone 4 lies not in the 45nm A4 processor, but in its ability to play "Plants vs. Zombies" and "Angry Birds."