Google DeepMind’s New AI Enables Robots to Perform Complex Tasks Without Training

Mar 21, 20253 Mins read373

Google DeepMind has attracted widespread attention in the technology industry and launched two new AI robot models based on Gemini 2.0, namely GeminiRobotics and GeminiRobotics-ER. The core goal of these two models is to improve the intelligence level of robots so that they can complete a variety of real-world tasks in complex environments. This marks the further deepening of the application of deep learning and large language models in the field of robotics, and also brings new possibilities for the future of intelligent robots.

Gemini Robotics Model

One of the models, called Gemini Robotics, with its powerful visual language action capabilities, allows robots to understand and respond to new situations without specific training. This model is built on DeepMind’s latest flagship AI, Gemini 2.0. According to Carolina Parada, senior director of DeepMind’s robotics department, Gemini Robotics has successfully expanded into the real world by integrating Gemini’s multimodal world understanding capabilities and adding new modalities of physical actions.

GeminiRobotics is a comprehensive vision-language-action model. Gemini Robotics has made significant progress in the three core areas required to build efficient robots: versatility, interactivity, and flexibility. Versatility enables the model to adapt to changes in different scenarios, interactivity means that the robot can quickly understand and respond to human instructions or changes in the environment, and finally, dexterity ensures that the robot can complete some delicate operations. Not only can it flexibly respond to various new situations, but it also shows better performance in interaction with humans and the environment. GeminiRobotics can even perform delicate tasks such as folding paper or unscrewing bottle caps more accurately, significantly improving the scope of the application of robots.

GeminiRobotics has significantly enhanced its ability to understand natural language instructions. Compared with its predecessor model, it can not only control a wider range of actions, but also adjust its behavior based on real-time input. This intelligent feedback mechanism allows the robot to better interact with users while continuously monitoring the surrounding environment and detecting changes in instructions or the environment in a timely manner.

Gemini Robotics-ER Model

Another new model is called Gemini Robotics-ER (Concrete Reasoning), which DeepMind describes as an advanced visual language model that can “understand a complex and dynamic world.” When performing tasks such as packing lunch boxes, the robot needs to consider the location of items on the table and the steps of operation. Gemini Robotics-ER is designed to solve such reasoning tasks. It can understand the relative position of food and containers, how to open the container, grab the food and put it in the appropriate position. This ability makes GeminiRobotics-ER have broad application prospects in dynamic environments. Through this model, robotics experts can dock with existing low-level control systems to unlock new features driven by Gemini Robotics-ER.

In terms of security, DeepMind researcher Vikas Sindhwani revealed that the company is developing a “layered security strategy” and has trained the Gemini Robotics-ER model to evaluate the safety of an action in a specific situation. At the same time, DeepMind also released new benchmarks and frameworks to promote security research in the field of AI. Last year, DeepMind also launched the “Robot Constitution” inspired by Isaac Asimov as a code of conduct for robots to ensure that robots can follow controllable safety procedures when making autonomous decisions.

DeepMind Opens Model to Multiple Testers

In terms of cooperation, DeepMind and Apptronik are committed to “building the next generation of humanoid robots.” DeepMind has also opened the Gemini Robotics-ER model to multiple “trusted testers” including Agile Robots, Agility Robotics, Boston Dynamics and Enchanted Tools, which means that these companies will participate in the early stages of the practical application of the new model. Parada said: “We are focused on building intelligence that can understand and act in the physical world, and we are very much looking forward to applying this AI technology to a wider range of fields and forms of expression.”

From an industry perspective, these two new models not only demonstrate the potential of AI in product development, but also represent the intelligent transformation of robots. With the application of these technologies, robots will take on more complex tasks in many fields such as home, occupation and medicine. For example, home robots will be able to help with housework, professional robots will be able to assist in dangerous work, and medical robots will assist in precise surgical operations.