Technology

China's AI Advances That Are Flying Under The Radar

  • China aims for AI innovation, balancing oversight and market access, rivalling the US.

Karan KambleMay 01, 2024, 02:53 PM | Updated 03:00 PM IST
The Astribot S1 humanoid robot (videograb)

The Astribot S1 humanoid robot (videograb)


Achievements in Artificial Intelligence (AI) coming out of the United States of America (US) typically dominate the public discourse. However, China is making similar big moves in AI. From a powerful large language model (LLM) to a skilled humanoid robot, the East-Asian giant has impressive recent releases rivalling those in the US.

SenseNova 5.0 LLM to rival GPT-4 Turbo

At a company event in Shanghai on 24 April, leading Chinese AI software company SenseTime unveiled their latest large model called “SenseNova 5.0.”

The fifth iteration of the model that debuted only a year ago has been trained on more than 10 TB (terabyte) of tokens. It has a context window coverage — referring to how much context a model considers when processing information — of approximately 200,000 during inference.


SenseTime’s model is multimodal, which is to say that it works across various types of input and output — text, visual, and audio. It is able to, for example, read and make sense of high-definition images and even turn images into text. Additionally, it can extract complex data from documents and answer questions accordingly.

Where SenseNova 5.0 proves to be an ace LLM is in its performance compared to other models. Its capabilities were apparently benchmarked against GPT-4 Turbo, OpenAI’s most advanced language model, and was found to surpass it, according to the firm that is counted among “the four dragons” of China’s computer vision industry.

SenseNova 5.0 also tops the authoritative multimodality benchmark MMBench in graphical and textual perception while putting up high scores in other well-known multimodal rankings like MathVista, AI2D, and ChartQA.

In addition, SenseTime announced an advanced text-to-video platform for the generation of a video based on a textual description. The tool offers options to maintain consistency in scene settings and the look and feel of characters appearing in the video, enabling anyone to “be your own film director.” On that note…

Launch of SenseNova 5.0 at the SenseTime Tech Day event

Vidu text-to-video AI tool, a Sora challenger

OpenAI’s Sora represented quite the leap in being able to create videos based on text input. Recently, Chinese startup Shengshu Technology in collaboration with Tsinghua University launched what could potentially be a Sora challenger. It’s called Vidu.


The demo video clips released were creative, imaginative, and real-world-like with impressive play on lighting and facial expressions, and multiple camera views. However, on superficial examination, the details in the videos do not as yet appear to be at the level of Sora’s powers. (AI video clips generated by Vidu can be watched here.)

Screengrab from a Vidu sample video clip

According to China Daily, the company says Vidu’s capabilities are “very close to” the level of Sora. The AI tool is said to be based on a Universal Vision Transformer (U-ViT) architecture, as opposed to Sora’s diffusion transformer. Work on the U-ViT, which integrates both diffusion and transformer models, was initiated in September 2022.

Vidu creator Shengshu Technology was founded in Beijing in March 2023. Its core team is made up of engineers from the Institute for Artificial Intelligence at Tsinghua University, Alibaba Group Holding, Tencent Holdings, and ByteDance.

Besides video creation, Shengshu has additional tools for the generation of high-quality, creative images and personalised 3D models based on simple text and image input.


Anyone with an interest in humanoid robots would be familiar with Boston Dynamics’ Atlas and Tesla’s Optimus, state-of-the-art robots that have been years in the making. However, a leading Chinese AI robotics company, Stardust Intelligence, recently gave the world an incredible first look at its humanoid robot, Astribot S1.

Only about a year in the making, the S1 can be seen in a demo video stacking cups, organising things on a desk, carrying out a variety of kitchen tasks such as slicing a cucumber, flipping bread on a pan, and even opening up a bottle of wine, pressing and folding clothes, watering a plant, writing in calligraphy, and much more.

Perhaps, the most impressive part comes towards the end of the video where the S1 is seen imitating a human being’s dance moves, indicating the robot’s ability to watch and learn by imitation directly from humans.

A dancing Astribot S1, learning by imitation

The S1 can move at a top speed of 10 metres per second, as compared to 7 metres per second for an adult male. It can carry about 10 kg of weight on its arm, which boasts of seven degrees of freedom.

The company claims that their humanoid robot can learn, think, and work like people, and use tools and equipment to help people complete boring, difficult, or dangerous tasks, while even adapting to changes in the environment. The S1 is “closest to human operating performance,” Stardust says.

The name “Astribot” comes from the ancient Latin proverb “Ad astra per aspera,” meaning “a journey through hardship to reach stardust.” Founded in December 2022 and founded in Shenzhen, the Astribot maker aims to put its intelligent robots in thousands of households with a vision to someday empower billions of people with AI robot assistants.


China certainly seems keen on getting more AI innovation out there on the market, while also retaining a degree of oversight and control over the development and popularisation of the critical emerging technology.

Join our WhatsApp channel - no spam, only sharp analysis