There’s no new GPU, and no new AI accelerator card.

Yet one domestic GPU vendor spent an entire launch event doing something remarkably physical—

It unveiled the first fully homegrown embodied intelligence simulation platform.

Let’s start with the demo.

The robot dog named Xiaofei slowly walked onto the stage.

After reaching center stage, Xiaofei in the simulated world performed a side flip; moments later, Xiaofei in the physical world made the exact same move.

Turn around and do it again—the motion is still like copy and paste.

Xiaofei’s motion policy was this:

It was 100% trained in the simulated world and transferred losslessly to the real physical world.

So which domestic GPU company is behind this? And what is the name of this embodied-intelligence simulation platform?

No suspense.

It is MT Lambda, freshly launched by Moore Threads.

What Xiaofei just did can be understood as:

This is the first time a motion-control policy trained on a fully domestic hardware stack has been fully deployed to a fully domestic edge chip, achieving the first real-world Sim-to-Real validation.

With that, Moore Threads has become the only GPU company in China to connect the full chain of “large model training — simulation — edge deployment.”

If the rise of large models was driven by being “fed” massive amounts of internet data, then the rise of embodied intelligence urgently needs an extremely realistic virtual world.

And now, domestic GPUs are beginning to build that world themselves.

If we break down MT Lambda, it is actually more like a production line centered on robot training.

At the top layer are two platforms: MT Lambda-Lab and MT Lambda-Sim.

MT Lambda-Lab focuses more on embodied policy development and training, targeting tasks such as reinforcement learning, imitation learning, and VLA models.

For developers, this layer is about solving the question of “how to teach an agent to do things” — how to train action policies, iterate behaviors, and make models gradually more stable across complex tasks.

MT Lambda-Sim focuses more on high-fidelity physical simulation and rendering, handling scene construction, sensor simulation, data generation, and simulation validation.

It addresses another question: can the world the robot sees, the objects it touches, and the feedback it receives after acting be as close to the real world as possible?

Together, these form the main development chain for embodied intelligence: data synthesis — policy training — simulation validation — edge deployment.

Why does this chain matter? Because the real world is too expensive.

At the launch event, Zhang Jianzhong pointed out three major pain points in training a good agent:

First, there is a lack of large amounts of high-quality data, and both human collection and teleoperation are very costly;
Second, training on real machines is risky and expensive; you can’t have robots or robot dogs falling over and getting damaged every day;
Third, real-world scenarios are often uncontrollable and difficult to generalize—something that works in the lab may fail as soon as the environment changes.

These points actually capture the most practical contradiction in embodied intelligence today: models evolve quickly, while physical scenarios accumulate slowly.

Large models can consume internet data, but robots consume data from the real world. A cup sliding off the edge of a table, a piece of cloth being picked up by a gripper, a car encountering a sudden obstacle on a rainy night—these tasks are impossible to fully describe in simple text. They involve lighting, materials, friction, collisions, motion trajectories, and sensor feedback. To truly teach robots to act, we must produce these complex scenarios at low cost, at scale, and reproducibly.

MT Lambda’s underlying capabilities revolve around three types of engines: physics, rendering, and AI.

First, the physics engine.

MT Lambda integrates open-source backends such as MuJoCo-Warp-MUSA and Newton-MUSA, as well as Moore Threads’ in-house AlphaCore physics engine.

Built on the MUSA architecture for parallel solving, they support high-precision, differentiable physics computation. Under typical simulation workloads, overall simulation throughput can improve by about 30x.

What does that mean?

For robots, the value of a physics engine goes far beyond making things move on screen. When a robotic arm picks up a deformable object, there is force feedback at fingertip contact; when a quadruped lands, different ground materials change the load and posture; in autonomous driving simulation, the motion relationships between vehicles, pedestrians, and obstacles must obey real physical laws. If the simulation is inaccurate, the policies trained in it are likely to fail in the real world.

Next, the rendering engine.

MT Lambda is equipped with the MT Photon engine, combining ray tracing and hybrid rendering, while also introducing 3DGS and Moore Threads’ own AI generative rendering capabilities to improve the realism, detail, and rendering efficiency of simulated visuals.

This part is especially critical. Embodied intelligence must not only compute actions, but also perceive the world. Multimodal inputs such as cameras, depth cameras, LiDAR, and tactile sensors all affect how a robot judges its environment. The more realistic the rendering, the closer synthetic data gets to real data, and the smaller the Sim-to-Real gap can become.

At the event, while discussing a partnership with Lightwheel Intelligence, Zhang Jianzhong mentioned that the MTT S5000 features an RT Core ray-tracing engine, which can bring nearly a 3x improvement in graphics rendering capability; in related tests, hardware ray-tracing acceleration using the MTT S5000 RT Core achieved a 2.7x performance boost.

Finally, the AI engine.

MT Lambda integrates the Torch-MUSA framework, deeply adapted to PyTorch, along with acceleration libraries such as muSolver and muFFT, supporting VLA model development and deployment, while also incorporating reinforcement learning and imitation learning training paradigms.

In embodied intelligence, the AI engine corresponds to training the robot’s brain: it connects vision, language, and action, and turns environmental feedback into the next decision.

Why can Moore Threads pack “compute, simulation, and rendering” into one Lambda?

This is precisely where the value of a full-featured GPU is amplified. After all, full-featured GPUs are already scarce in China.

Embodied intelligence places demands on chips that go far beyond AI matrix computation.

Robot training needs VLA models, reinforcement learning, and imitation learning — that is AI compute; it needs to simulate collisions, friction, dynamics, and complex contacts — that is scientific computing and physical AI; it needs to generate realistic training visuals and sensor data — that is 3D rendering; and in the future, it will involve massive amounts of video acquisition, transmission, generation, and playback — which in turn depends on ultra-HD video encoding and decoding.

TPUs, NPUs, and some GPGPU approaches often focus more narrowly on AI compute or one specific type of general-purpose compute. They can be highly efficient in certain scenarios, but embodied intelligence is more complex: it needs to train the digital brain, build the physical world, and bring real visuals and sensor feedback into the training loop.

The reason Moore Threads could turn MT Lambda into a platform that unifies physics, rendering, and AI engines lies in the full-featured GPU strategy it has adhered to since its founding.

According to Moore Threads’ definition, a full-featured GPU relies on its in-house MUSA architecture to support AI compute, graphics rendering, physical simulation, scientific computing, and ultra-HD video encoding and decoding on a single chip.

In other words, MT Lambda is not a kit cobbled together from disconnected tools; it is platform capability built directly on a full-featured GPU and a unified MUSA architecture.

For embodied intelligence, this kind of integration of compute, simulation, and rendering maps perfectly to the real needs of robot training: running AI models, calculating physical collisions, and rendering realistic visuals all at the same time.

In the past, developers may have had to switch between different hardware and software stacks: one platform for AI training, another for graphics rendering, and yet another tool for physics simulation. Data had to be moved back and forth between systems, which was inefficient, hard to debug, and prone to accumulating errors.

What MT Lambda aims to do is bring these previously fragmented steps back onto one foundation as much as possible. For developers, the ideal state is spending less time wrestling with low-level adaptation and more time on the algorithms, tasks, and scenarios themselves.

The cloud, the edge, and the ecosystem are also starting to close the loop

If MT Lambda solves how to train and simulate, another line of effort from Moore Threads is to complete the cloud, edge, and ecosystem pieces as well.

On the cloud side, there is the KUAE intelligent computing cluster.

In the era of large models, clusters are first understood as training infrastructure; but in the era of embodied intelligence, they are also like a giant robot training ground. Once simulation data is scaled up, demand grows rapidly:

A single robotic arm trajectory may need to generate visuals from multiple camera angles, under different lighting, with various materials and disturbances; an autonomous driving world model may generate massive weekly testing mileage; humanoid robot training also requires large numbers of parallel environments for repeated trial and error…

When data reaches the scale of millions or tens of millions of frames, the role of underlying compute shifts from accelerator to production line.

The core acceleration unit of Moore Threads’ KUAE cluster includes the MTT S5000. The MTT S5000 is based on the fourth-generation MUSA architecture, Pinghu, and delivers up to 1000 TFLOPS of dense AI compute per card, with 80GB of VRAM and 1.6TB/s memory bandwidth. It supports full-precision computation from FP8 to FP64, and is also one of the very few domestic GPUs in China that support both hardware ray tracing and AI training/inference.

In the context of embodied intelligence, these specs become much clearer: FP8, BF16, and FP16 support AI training and inference; ray tracing supports high-fidelity rendering; physical simulation and scientific computing support complex dynamics solving. In other words, embodied intelligence requires multiple capabilities to work together on a single architecture.

On the edge side, there are the Changjiang SoC and the E300 AI module.

The cloud handles large-scale training, and the simulation platform handles trial and verification, but in the end, the policy still needs to run on the robot itself. When robots act in the real world, they often cannot rely entirely on cloud responses. They need local perception, decision-making, and control, especially for tasks requiring low latency and high reliability. Edge compute is an essential piece that must be filled in.

The MTT E300 AI module, based on the Changjiang SoC, provides 50 TOPS-class local compute and can be deployed directly on robot terminals, supporting low-latency, high-reliability real-time response. In other words, the experience learned in the cloud must be turned into the robot’s immediate reaction through the edge module.

This creates a more complete closed loop: the cloud performs large-scale training and parallel simulation, MT Lambda handles policy development, data synthesis, and simulation validation, and the E300 AI module brings the training results to the robot terminal for execution.

More importantly, Moore Threads’ overall layout has already begun entering real-world ecosystem validation.

For example, in collaboration with BAAI, RoboBrain 2.5 completed end-to-end training on an MTT S5000 thousand-card cluster. The validation results showed that its training loss curve closely matched that of the H100 cluster, with a difference of only 0.62%, and in some tasks it performed even better; the cluster scaled from 64 cards to 1024 cards, achieving more than 90% linear scaling efficiency.

The significance of such results is that they validate the usability of domestic compute clusters as the training foundation for embodied models.

Another example is the partnership with Lightwheel Intelligence, which is more focused on mass-producing simulation data. Based on Moore Threads’ full-featured GPU and the KUAE cluster, combined with Lightwheel Intelligence’s three-in-one simulation platform of “solving — measurement — generation,” the two sides have jointly built a high-confidence simulation data synthesis solution. Lightwheel Intelligence’s high-precision GPU physics solver has been adapted to the MUSA architecture, supporting high-precision real-time simulation of complex physical processes including rigid bodies, soft bodies, fluids, and particles. In related cases, the simulation accuracy of core physical parameters has reached over 99%.

The collaboration with Pony.ai extends the scenario to autonomous driving. Based on the MTT S5000 and the KUAE cluster, the two sides are advancing adaptation and validation for world models and in-vehicle model training. Pony.ai’s world model can generate more than 10 billion kilometers of test data per week, producing a large number of extreme scenarios. For autonomous driving, long-tail cases, extreme hazards, and safety validation are exactly where simulation is most valuable.

In addition, Moore Threads is also working with partners such as 51WORLD and RayVision Cloud to advance physical AI simulation systems and embodied simulation platform development. Whether it is 4DGS model training and inference, synthetic data generation, or task libraries, simulation computing, and the closed loop of virtual-real validation, they are all fundamentally answering the same question: embodied intelligence cannot be built by a single company working in isolation; it requires compute, simulation, algorithms, and scenario providers to come together and make the ecosystem work.

That is also what makes Moore Threads’ launch worth paying attention to.

It has moved the story from “I have a chip” to “I can build an infrastructure stack.”

Building upward from the underlying MUSA architecture and full-featured GPU to platforms, downward to edge devices, and laterally to the ecosystem. This strategy may not reshape the industry overnight, but it has already pushed the domestic GPU battlefield one step beyond large-model training and inference, and toward physical AI infrastructure.

What it needs to build is domestic embodied-intelligence infrastructure

One major contradiction in embodied intelligence today is that models are moving fast, but scenarios are moving slowly.

In the digital world, large models can keep evolving on massive amounts of text, image, and video data; but in the physical world, robots must learn to open doors, carry boxes, grasp deformable objects, and navigate complex intersections—every action comes with a real cost behind it.

Real-world data collection is expensive, teleoperation is slow, hardware damage is risky, dangerous scenarios cannot be casually tested, and long-tail cases are nearly impossible to exhaust. As a result, simulation-generated data and the Sim-to-Real loop have become the key infrastructure for embodied intelligence to move from the lab to industry.

That is why “building the world” has become the central question in embodied intelligence competition.

Here, the core value of the world is not how pretty it looks in a gaming sense, but whether it can train robots, validate robots, and correct robot behavior. It must be realistic enough to reflect lighting, materials, collisions, friction, and sensor noise; it must also be efficient enough to generate data at scale in parallel; and it must be open enough to allow different models, robots, and scenarios to connect.

From this perspective, Moore Threads’ advantage is difficult to summarize with any single metric. Its “full-featured GPU + MUSA ecosystem” technology path is naturally closer to the composite demands of embodied intelligence.

The full-featured GPU provides AI compute, graphics rendering, physical simulation, scientific computing, and video encoding/decoding; MUSA provides a unified software ecosystem; MT Lambda integrates the three major engines of physics, rendering, and AI; the KUAE cluster handles large-scale training and simulation; the Changjiang SoC and E300 AI module bring capabilities to the edge; and external ecosystem partners fill in the gaps in data, scenarios, simulation platforms, and industry applications.

The value of this chain is that embodied intelligence is, at its core, a systems engineering effort.

Large-model companies can first build the digital brain, but robot companies ultimately have to deal with how the brain controls the body, how the body understands the environment, and how the environment can be reproduced at low cost. Whoever can create a training world for robots that is more realistic, more controllable, and large-scale at lower cost and higher efficiency will have a better chance of bringing embodied intelligence from demos into real production lines, roads, homes, and urban spaces.

Of course, building domestic embodied-intelligence infrastructure will not happen overnight.

Whether it is simulation realism, Sim-to-Real transfer performance, the maturity of the developer ecosystem, or large-scale adoption by industrial customers, all of these will require continuous validation. How far Moore Threads’ solution can go will also depend on feedback from more real projects, more developers, and more robot embodiments.

But at least from this launch event, domestic GPUs are entering a new stage.

They are starting to move beyond the passive narrative of whether they can replace some specific card, and instead actively define new compute scenarios: the upgraded “Xiaomai” at the launch is a digital agent; the flipping robot dog “Xiaofei” is a physical agent. As AI moves from the screen into reality, and agents evolve from speaking to acting, the underlying compute layer must understand models, graphics, and physics at the same time.

At the event, Zhang Jianzhong mentioned that he hopes Moore Threads’ products, from KUAE to Changjiang, can empower all agents.

Translated into the embodied-intelligence context, that means more concretely: the cloud has a large training ground, the simulation world has a virtual environment, the edge has a small brain for execution, and the ecosystem has real-world scenarios.

Competition in large models is about who can train the stronger digital brain; competition in embodied intelligence must also compete on another front: who can build a training world that is realistic enough first.

This time, domestic GPUs have already stepped in to build that world.

Domestic GPUs Start Building the World! China’s First Full-Stack Embodied AI Simulation Platform Is Here

Why can Moore Threads pack “compute, simulation, and rendering” into one Lambda?

The cloud, the edge, and the ecosystem are also starting to close the loop

What it needs to build is domestic embodied-intelligence infrastructure