Technology

China's AI Advances That Are Flying Under The Radar

Karan Kamble

May 01, 2024, 02:53 PM | Updated 03:00 PM IST


The Astribot S1 humanoid robot (videograb)
The Astribot S1 humanoid robot (videograb)
  • China aims for AI innovation, balancing oversight and market access, rivalling the US.
  • Achievements in Artificial Intelligence (AI) coming out of the United States of America (US) typically dominate the public discourse. However, China is making similar big moves in AI. From a powerful large language model (LLM) to a skilled humanoid robot, the East-Asian giant has impressive recent releases rivalling those in the US.

    SenseNova 5.0 LLM to rival GPT-4 Turbo

    At a company event in Shanghai on 24 April, leading Chinese AI software company SenseTime unveiled their latest large model called “SenseNova 5.0.”

    The fifth iteration of the model that debuted only a year ago has been trained on more than 10 TB (terabyte) of tokens. It has a context window coverage — referring to how much context a model considers when processing information — of approximately 200,000 during inference.

    “The major advancements in SenseNova 5.0,” the company says, “focus on knowledge, mathematics, reasoning, and coding capabilities.” It is said to have “best-in-class mathematical, coding and reasoning capabilities,” making it well-suited for applications in scientific research, finance, and data analysis.

    SenseTime’s model is multimodal, which is to say that it works across various types of input and output — text, visual, and audio. It is able to, for example, read and make sense of high-definition images and even turn images into text. Additionally, it can extract complex data from documents and answer questions accordingly.

    Where SenseNova 5.0 proves to be an ace LLM is in its performance compared to other models. Its capabilities were apparently benchmarked against GPT-4 Turbo, OpenAI’s most advanced language model, and was found to surpass it, according to the firm that is counted among “the four dragons” of China’s computer vision industry.

    SenseNova 5.0 also tops the authoritative multimodality benchmark MMBench in graphical and textual perception while putting up high scores in other well-known multimodal rankings like MathVista, AI2D, and ChartQA.

    In addition, SenseTime announced an advanced text-to-video platform for the generation of a video based on a textual description. The tool offers options to maintain consistency in scene settings and the look and feel of characters appearing in the video, enabling anyone to “be your own film director.” On that note…

    Launch of SenseNova 5.0 at the SenseTime Tech Day event
    Launch of SenseNova 5.0 at the SenseTime Tech Day event

    Vidu text-to-video AI tool, a Sora challenger

    OpenAI’s Sora represented quite the leap in being able to create videos based on text input. Recently, Chinese startup Shengshu Technology in collaboration with Tsinghua University launched what could potentially be a Sora challenger. It’s called Vidu.

    Vidu can generate 16-second videos, lower than Sora’s 60 seconds, with 1080p resolution based on simple text prompts. According to Zhu Jun, the chief scientist at Shengshu, the tool is “imaginative,” “can simulate the physical world,” and “produce 16-second videos with consistent characters, scenes, and timeline,” while also being able to comprehend “Chinese elements,” as reported by South China Morning Post.

    The demo video clips released were creative, imaginative, and real-world-like with impressive play on lighting and facial expressions, and multiple camera views. However, on superficial examination, the details in the videos do not as yet appear to be at the level of Sora’s powers. (AI video clips generated by Vidu can be watched here.)

    Screengrab from a Vidu sample video clip
    Screengrab from a Vidu sample video clip

    According to China Daily, the company says Vidu’s capabilities are “very close to” the level of Sora. The AI tool is said to be based on a Universal Vision Transformer (U-ViT) architecture, as opposed to Sora’s diffusion transformer. Work on the U-ViT, which integrates both diffusion and transformer models, was initiated in September 2022.

    Vidu creator Shengshu Technology was founded in Beijing in March 2023. Its core team is made up of engineers from the Institute for Artificial Intelligence at Tsinghua University, Alibaba Group Holding, Tencent Holdings, and ByteDance.

    Besides video creation, Shengshu has additional tools for the generation of high-quality, creative images and personalised 3D models based on simple text and image input.

    Astribot S1 humanoid robot for household chores

    Anyone with an interest in humanoid robots would be familiar with Boston Dynamics’ Atlas and Tesla’s Optimus, state-of-the-art robots that have been years in the making. However, a leading Chinese AI robotics company, Stardust Intelligence, recently gave the world an incredible first look at its humanoid robot, Astribot S1.

    Only about a year in the making, the S1 can be seen in a demo video stacking cups, organising things on a desk, carrying out a variety of kitchen tasks such as slicing a cucumber, flipping bread on a pan, and even opening up a bottle of wine, pressing and folding clothes, watering a plant, writing in calligraphy, and much more.

    Perhaps, the most impressive part comes towards the end of the video where the S1 is seen imitating a human being’s dance moves, indicating the robot’s ability to watch and learn by imitation directly from humans.

    A dancing Astribot S1, learning by imitation
    A dancing Astribot S1, learning by imitation

    The S1 can move at a top speed of 10 metres per second, as compared to 7 metres per second for an adult male. It can carry about 10 kg of weight on its arm, which boasts of seven degrees of freedom.

    The company claims that their humanoid robot can learn, think, and work like people, and use tools and equipment to help people complete boring, difficult, or dangerous tasks, while even adapting to changes in the environment. The S1 is “closest to human operating performance,” Stardust says.

    The name “Astribot” comes from the ancient Latin proverb “Ad astra per aspera,” meaning “a journey through hardship to reach stardust.” Founded in December 2022 and founded in Shenzhen, the Astribot maker aims to put its intelligent robots in thousands of households with a vision to someday empower billions of people with AI robot assistants.

    Clearly, China is serious about its AI ambitions. According to a Reuters report in January this year, the country green-lit over 40 AI models for public use since the approval process began in August 2023. It already reportedly accounts for some 40 per cent of the global total LLMs, just behind the US' 50 per cent share.

    China certainly seems keen on getting more AI innovation out there on the market, while also retaining a degree of oversight and control over the development and popularisation of the critical emerging technology.

    Also Read: Why Tesla's Optimus Is A Turning Point In Robotics


    Karan Kamble writes on science and technology. He occasionally wears the hat of a video anchor for Swarajya's online video programmes.

    Get Swarajya in your inbox.


    Magazine


    A road trip through the poorest regions of India — its heartland