[ad_1]
The Vidar embodied AI mannequin from ShengShu makes use of simulated worlds as an alternative of bodily coaching information. Source: Adobe Stock, Vectorhub by ice
ShengShu Technology Co. yesterday launched its multi-view bodily AI coaching mannequin, Vidar — which stands for for “video diffusion for action reasoning.” Using Vidu’s capabilities in semantic and video understanding, Vidar makes use of a restricted set of bodily information to simulate a robotic’s decision-making in real-world environments, stated the corporate.
“Vidar offers a radically different approach to training embodied AI models,” acknowledged ShengShu Technology. “Just as Tesla focuses on vision-based training and Waymo leans into lidar, the industry is exploring divergent paths to physical AI.”
Founded in March 2023, ShengShu Technology specializes within the growth of multimodal giant language fashions (LLMs). The Beijing-based firm stated it delivers mobility-as-a-service (MaaS) and software-as-a-service (SaaS) merchandise for smarter, quicker, and extra scalable content material creation.
With its flagship video-generation platform Vidu, ShengShu stated it has reached customers in additional than 200 international locations and areas all over the world, spanning fields together with interactive leisure, promoting, movie, animation, cultural tourism, and extra.
Vidar simulated coaching to speed up robotic growth
“While some companies train physical AI by embedding models into real-world robots and collecting data through the physical interactions that their robots encounter, it’s a method that’s costly, hardware-dependent, and difficult to scale,” stated ShengShu Technology. “Others rely on purely simulated training, but this often lacks the variability and edge-case data needed for real-world deployment.”
Vidar takes a unique method, the corporate claimed. It combines restricted bodily coaching information with generative video to make predictions and generate new hypothetical situations, making a multi-view simulation that includes lifelike coaching environments, all inside a digital house. This permits for extra sturdy, scalable coaching with out the time, price, or limitations of physical-world information assortment, defined ShengShu.
Built on prime of the Vidu generative video mannequin, Vidar can carry out dual-arm manipulation duties with multi-view video prediction and even reply to natural-language voice instructions after fine-tuning. The mannequin successfully serves as a digital mind for real-world motion, stated the corporate.
Using Vidu’s generative video engine, Vidar generates large-scale simulations to cut back dependency on bodily information, whereas sustaining the complexity and richness wanted to coach real-world-capable AI brokers. ShengShu stated Vidar can extrapolate a generalized sequence of robotic actions and duties from solely 20 minutes of coaching information. The firm asserted that’s between 1/80 and 1/1,200 of the info wanted to coach industry-leading fashions together with RDT and π0.5.
ShengShu stated Vidar’s core innovation lies in its modular two-stage studying structure. Unlike conventional strategies that merge notion and management, Vidar decouples them into two distinct phases for higher flexibility and scalability.
In the upstream stage, large-scale normal video information and moderate-scale embodied video information are used to coach Vidu’s mannequin for perceptual understanding.
In the second downstream stage, a task-agnostic mannequin known as AnyPos turns that visible understanding into actionable motor instructions for robots. This separation makes it considerably simpler and quicker to coach and deploy AI throughout several types of robots, whereas decreasing prices and rising scalability.
Vidar is designed to cut back the quantity of coaching information wanted to coach AI fashions. Source: ShengShu Technology.
Vidar a framework for scalable embodied intelligence
Vidar follows a scalable coaching framework impressed by language and picture basis fashions of the previous decade of AI breakthroughs. ShengShu stated its three-tiered information pyramid, spanning large-scale generic video, embodied video information, and robot-specific examples, makes for a extra versatile system, decreasing conventional information bottleneck.
Built on the U-ViT structure, which explores the fusion of diffusion fashions and transformer architectures for a large assortment of multimodal technology duties, Vidar harnesses long-term temporal modeling and multi-angle video consistency to energy bodily grounded decision-making.
This design helps speedy switch from simulation to real-world deployment, which ShengShu stated is essential for robotics in dynamic environments. It additionally minimizes engineering complexity, in response to the corporate,
ShengShu stated Vidar can facilitate robotics adoption throughout a number of sectors. From house assistants and eldercare to sensible manufacturing and medical robotics, the mannequin allows quick adaptation to new environments and multi-task situations, all with minimal information, it added.
Vidar creates an AI-native path for robotics growth that’s environment friendly, scalable, and cost-effective, ShengShu claimed. By remodeling normal video into actionable robotic intelligence, the corporate stated its mannequin can bridge the hole between visible understanding and embodied company.
Vidar has a modular studying structure. Source: ShengShu Technology
ShengShu marks milestones in multimodal AI
Vidar builds on the speedy momentum of the Vidu video basis mannequin, stated ShengShu. The firm listed statistics since its debut:
- Vidu reached 1 million customers inside one month
- Surpassed 10 million customers in simply three months
- Generated over 100 million movies by Month 4
- Reference-to-video technology exceeded 100 million by Month 8
- Total generated movies now prime 300 million
ShengShu continues to develop the frontiers of multimodal AI, Vidar represents the following frontier—bringing generalization, generativity, and embodiment into one unified system.
Editor’s be aware: RoboBusiness 2025, which will probably be on Oct. 15 and 16 in Santa Clara, Calif., will embody tracks on bodily AI and humanoid robots. Registration is now open.

