Apple has recently developed a robot perception system called ARMOR. This robot perception system uses software and hardware to enhance the robot’s “spatial awareness” and can dynamically avoid collisions. In terms of hardware, ARMOR provides the robot with an almost complete view of the environment by installing small intelligent depth sensors on the robot’s arms, solving the blind spot and occlusion problems in traditional robot perception. In terms of software, Apple has developed a Transformer-based AI-driven ARMOR-Policy that can learn from human actions and help robots dynamically plan actions.
The team also deployed ARMOR on the Fourier GR-1 robot for experiments. The experimental results show that the ARMOR system reduces collisions by 63.7% compared to using four head-mounted and externally mounted depth cameras (exocentric perception). Compared with the sampling-based motion planning expert system cuRobo, the ARMOR-Policy has a 26-fold increase in computational efficiency, ensuring that the robot can act quickly.
What Does the ARMOR System Look Like?
At present, traditional humanoid robots usually rely on centralized cameras and lidars installed on the head or torso for environmental perception. Although this method is easy to integrate and can provide a good field of view, it often has serious occlusion problems in the arm and hand areas. Although some studies have tried to integrate tactile sensing on the robot terminal actuator, this solution is expensive and difficult to apply to robot arms on a large scale. At the same time, how to effectively use tactile input in strategy learning remains to be solved.
The ARMOR system provides a hardware and software-integrated design. It was developed by Daehwa Kim, a scholar from Carnegie Mellon University, and the Apple team during his internship at Apple.
In terms of hardware, unlike centralized RGBD cameras that capture all the details in dense frames at once, the team chose lidar as the basic sensing unit, distributing sparse perception on multiple sensors to form “self-centered perception”. The research team arranged 20 such sensors on each arm of the robot, a total of 40 sensors to form a distributed perception network.
On the software side, the research team developed ARMOR-Policy based on the Transformer encoder-decoder architecture, similar to the Action Block Transformer (ACT). The policy learns from collision-free human motion demonstrations through imitation learning. To train this policy, the research team used 311,922 real human motion sequences (about 86.6 hours) from the AMASS dataset, which contains a variety of relevant human postures such as manipulation, dance, and social behavior. The team redirected these human motion trajectories to the robot’s joint configuration and generated compact obstacles around the trajectory to ensure that the trajectory itself does not collide.
Advantages of the ARMOR System
Training data is generated using three strategies: obstacle avoidance motion, emergency stop, and collision-free motion. The network architecture design of ARMOR-Policy takes into account the fact that there may be multiple valid solutions for motion planning. By introducing an additional encoder layer to infer the latent variable z, the policy can generate different motion trajectory candidates by adjusting z. In the inference phase, the system calculates N candidate trajectories in parallel and selects the optimal path by minimizing the distance between the robot and the point cloud. The network inputs include the latent variable z, the current and target joint positions (28-dimensional vectors), and the depth image data of 40 ToF lidar sensors.
Compared with the traditional strategy of using four head-mounted and externally mounted depth cameras (external perception), the ARMOR system has achieved significant improvements in obstacle avoidance performance, with a 63.7% reduction in collisions and a 78.7% increase in success rate. At the same time, compared with the sampling-based motion planning expert system cuRobo, ARMOR-Policy showed better performance, with a 31.6% reduction in collisions, a 16.9% increase in success rate, and a 26-fold increase in computational efficiency.
The research team also verified the feasibility of the ARMOR system in real environments by deploying 28 ToF lidars on the Fourier GR-1 humanoid robot. The system can achieve real-time obstacle avoidance trajectory updates at 15Hz.
Expanding the Integration of Technology And Daily Life
In addition, Apple is actively exploring the application of robots in the field of smart homes. With the continuous development of Internet of Things technology, smart homes have become a hot market. By combining robotics with smart home systems, users can more conveniently control various devices in their homes and achieve a more intelligent home life. For example, users can use voice commands to let the robot turn on the lights, adjust the temperature, or play music. This intelligent lifestyle not only improves the convenience of life, but also brings a new life experience to users.
Of course, Apple’s research and development in the field of robotics is not smooth sailing. Although the research and development of robot displays is relatively advanced, the project has been hovering in the company’s product roadmap for many years, sometimes added and sometimes deleted. This reflects that Apple also faces many uncertainties and challenges in the process of exploring the field of robotics. However, this did not stop Apple from moving forward. On the contrary, they seem to be more determined to bring personal robot technology to the market. With the continuous advancement of technology and the practice of innovative ideas, there is reason to believe that personal robots will become an indispensable part of family life in the future.