AI drives digital humans to "fly into the homes of ordinary people"

Since the opening of the China Pavilion at the 2025 World Expo in Osaka, Japan in April, it has attracted numerous visitors for daily attendance. As visitors enter the China Pavilion, they can hear a voice saying, "I'm here, old Sun." This voice comes from iFlytek Co., Ltd. (hereinafter referred to as "iFlytek"), which created the multilingual "AI Monkey King" for the China Pavilion, proficient in Chinese, Japanese, and English. Its appearance and tone replicate the Monkey King from the classic animated film "Havoc in Heaven."

Behind "AI Sun Wukong" is the rapidly developing digital human technology in recent years. As a key connection point for emerging industries such as AI and the metaverse, the role of digital humans in the development of the digital economy is becoming increasingly important. With the in-depth application of large model technology in this field, digital humans are gradually moving from "usable" to "user-friendly," pushing related industries into a new stage of development.

Forming three major categories of application scenarios

Digital humans refer to digital intelligences created through various digital intelligence technologies such as modeling. They possess human-like appearance, voice, and language, can simulate physical actions, have cognitive abilities, and can achieve functions such as learning, generation, and interaction supported by large models.

Driven by technology and demand, China's digital human industry ecosystem is becoming increasingly完善, with the scale of applications continuously expanding, and the production, operation, and service capabilities of the upstream and downstream of the industry chain gradually improving. According to data from Tianyancha, as of 2024, the number of enterprises related to digital humans in China has reached 1.144 million, with over 174,000 new registered enterprises in the first five months of 2024 alone, demonstrating the market potential and vitality of the digital human industry.

According to Wu Suoning, a member of the Expert Advisory Committee of the China Internet Association, to avoid the digital human technology becoming merely a formality and causing resource waste, it is essential to find application outlets and promote the implementation of digital human applications from point to surface.

Driven by applications, the digital human industry is accelerating the construction of a "technology-scene-business" closed-loop ecosystem.

The "China Digital Human Development Report (2024)" (hereinafter referred to as the "Report") released by the China Internet Association analyzes that the application scenarios of digital humans have preliminarily formed three main categories: media digital humans, service digital humans, and industry digital humans. Among them, media digital humans are the most mature form of digital human applications at present. The proportion of scenarios generated around media digital humans can reach 50%, and their realistic images and smooth language expression greatly enhance the interactivity and entertainment of information dissemination.

For example, at the "New Year Technology Show" - the "China Science and Technology Innovation Gala" launched for the first time by the Central Radio and Television Station last year, there was a scene where host Zhang Tengyue co-hosted with an "AI Avatar". This "AI host", created based on iFlytek's iFlytek Smart Creation platform, not only possesses the same voice, expressions, and movements as a real host but can also communicate with the host with ease, accurately understand each other's words, and respond appropriately and quickly, making it difficult for the audience to distinguish between the real and the artificial.

The report shows that, in addition to media digital people, service digital people have also undergone comprehensive upgrades, possessing stronger interactive capabilities, with the proportion of scenarios reaching 30%, widely applied in government affairs, e-commerce, finance, and other fields; industry digital people have begun to emerge, with the proportion of scenarios reaching 20%, gradually playing a role in medical care, education, and enterprise management.

Expected to become the entry point for AI innovation

The digital human has roughly gone through three stages: from being driven by real people to being driven by programs, and now to being driven by AI.

Early human-driven digital humans can present virtual digital images, but they mainly rely on technologies such as computer graphics modeling and motion capture, requiring substantial language and action data provided by real people. Program-driven digital humans no longer need real people to provide language and action data, but they are based on fixed computer programs, making them closer to "digital robots" and unable to achieve high realism in human-like effects. In recent years, AI-driven digital humans not only present more realistic details in voice broadcasting, actions, and expressions but also gradually possess more powerful interaction and cognitive abilities.

"A few years ago, digital humans might have had issues such as mismatched lip movements and expressions, as well as stiff actions. This was due to the digital humans' inadequate understanding of textual semantics, and their expressions and movements mostly relied on limited preset resources, making it impossible to precisely match the textual content." said Gao Jingwen, head of iFlytek's digital human business. With the in-depth application of large model technology in the field of digital humans, the performance of digital human products has reached a new level.

For example, in October last year, iFlytek released a super-human digital human. It is based on a multimodal diffusion generation large model, capable of generating body movements in real time according to the rhythm, intonation, and content of speech, breaking through the limitations of action preset templates, and greatly enhancing the expressiveness of digital humans in dynamic scenes. The digital human released by Tencent, Zhi Ying, can achieve "image cloning" and "voice cloning"; users only need to upload a small number of images, videos, and audio materials to quickly generate their own digital avatar and customize the voice. Alibaba's open-source AI digital human EchoMimic can endow static images with lively voices and expressions.

"In short, large model technology enables digital humans to truly understand semantics and allows them to quickly generate corresponding actions and expressions based on their understanding of the text, thus achieving lifelike representation," said Gao Jingwen.

Shang Bing, the chairman of the China Internet Association, believes that digital humans are becoming a lively application landing entry for AI, with a strong degree of linkage, embedding, and integration with big data, smart terminals, embodied intelligence, and other industries. They may become one of the active interactive interfaces of the next generation of the internet. It is essential to pay attention to the practical implementation of innovative applications, actively explore emerging business formats such as digital humans, and accelerate the formation of advantages in scaled applications.

Wu Suoning also believes that digital humans are the breakthrough point for AI applications and the entry point for AI innovation. AI-driven digital humans are expected to bring more diverse applications to various industries, and these applications can precisely guide AI towards a path of pragmatic development.

Create a personalized "digital twin"

With the widespread application of digital humans, many scenarios have placed higher demands on digital humans.

"For example, scenarios such as e-commerce live broadcast and customer service Q&A put forward extremely high requirements for the real-time interaction ability of digital humans. Digital humans must not only be able to have real-time conversations with users, but also generate corresponding actions and expressions according to the content of the dialogue, otherwise it will affect the efficiency of business processing and directly affect the user experience. Gao Jingwen introduced that in order to improve the efficiency of the digital human video generation model, the company's team has developed an action representation extraction technology, which converts speech and text input into compact intermediate representations, effectively compressing the video dimension. With the help of this technology, the system can quickly extract key information from the input text and voice like a stenographer, reduce the amount of irrelevant information data, and generate videos accordingly, greatly improving the efficiency of video generation and ensuring the real-time interaction between digital humans and users.

What is also important to note is that, although the digital human industry is developing rapidly, it is still in a fast-growing phase. Gao Jingwen believes that there is currently a phenomenon of homogenization in digital human products, and personalization and customization will become important development directions for the digital human industry in the future. With the development of generative AI technology, the production threshold and costs of digital humans have rapidly decreased, while production efficiency and content diversity have significantly improved. It has become a reality for users to create more personalized digital human products based on their own characteristics. Now, just with a photo or a voice recording, one can generate highly personalized hyper-realistic digital humans, greatly simplifying the requirements for preset materials in digital human customization and optimizing the user's operational path.

Gao Jingwen also admitted that although large model technology is pushing digital humans "into the homes of ordinary people," achieving more refined effects still requires a large amount of data feeding and interactive training. In addition, the accompanying issues of privacy leakage and data security cannot be ignored.

"In the future, each of us may have a 'digital twin' that can assist us in handling work, answer our questions in life, and become our companion," said Gao Jingwen.

(Source: Science and Technology Daily)

Source: Dongfang Caifu Wang

Author: Science and Technology Daily

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate app
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)