Meta Unveils Enhanced MTIA Chip to Boost AI Performance Amidst Global Chip Shortage

Apr 15, 20243 Mins read706

Amid a scarcity of AI chips, more and more tech giants are opting for in-house development.

On April 10th, local time, social media behemoth Meta unveiled the latest iteration of its proprietary chip, MTIA. MTIA is a series of custom chips designed by Meta specifically for AI training and inference work. Compared to last year’s announcement of Meta’s first-generation AI inference accelerator, MTIA v1, the latest version of the chip shows a significant performance improvement, tailored for Meta’s social software ranking and recommendation systems. Analysis suggests that Meta’s aim is to reduce dependence on chip manufacturers like Nvidia.

On the day of the announcement, Meta’s stock (Nasdaq: META) closed at $519.83 per share, up 0.57%, with a total market value of $1.33 trillion. Wind data indicates that Meta’s stock price has risen over 47% since the beginning of the year.

From its name, MTIA stands for “Meta Training and Inference Accelerator.” Despite the inclusion of “training,” this chip is actually not optimized for AI training but rather focuses on inference, which involves running AI models during production.

Meta stated in a blog post that MTIA is a “key part of the company’s long-term plans,” aimed at using AI to build infrastructure for Meta’s services: “To realize our ambitions for custom chips, this means investing not only in compute chips but also in memory bandwidth, networking and capacity, and other next-generation hardware systems.”

According to Latest reports, the new MTIA chip “fundamentally focuses on providing a proper balance of compute, memory bandwidth, and memory capacity.” The original MTIA v1 chip was based on TSMC’s 7nm process technology, while the new MTIA chip uses TSMC’s 5nm process, featuring more processing cores. This chip will have 256MB of on-chip memory with a frequency of 1.3GHz, whereas MTIA v1 had 128MB and 800GHz of on-chip memory. Early testing results from Meta indicate that the performance of the new chip is three times that of the previous generation when tested on “four key models.”

In terms of hardware, to support the next generation of chips, Meta has developed a large rack-mounted system capable of accommodating up to 72 accelerators. It consists of three chassis, each containing 12 boards, with each board housing two accelerators. This system can increase the chip’s clock frequency from the original 800 MHz to 1.35GHz and operate at 90 watts, compared to the original design’s 25 watts power consumption.

MTIA large-rack mounted system — *The Large Rack-Mounted System Developed by Meta for the MTIA*

On the software side, Meta emphasizes that the software stack running on the new chip system is very similar to MTIA v1, speeding up team deployment. Additionally, the new MTIA is compatible with the code developed for MTIA v1, enabling developers to launch and run Meta’s traffic using this new chip within days, allowing Meta to deploy chips in 16 regions and run production models within nine months.

According to Meta’s summary, testing results so far indicate that this MTIA chip can handle low-complexity (LC) and high-complexity (HC) ranking and recommendation models as components of Meta products: “Because we control the entire stack, we can achieve higher efficiency compared to commercial GPUs.”

Currently, the new MTIA chip has been deployed in Meta’s data centers and has shown positive results: “The company can allocate more computing power for denser AI workloads and invest more in computing power. It turns out that in providing the best combination of performance and efficiency for workload specific to Meta, this chip is highly complementary to commercial GPUs.”

In February of this year, foreign media revealed information about the second-generation MTIA chip, stating that Meta plans to produce an AI chip internally referred to as “Artemis” this year, further accelerating the company’s expansion in the AI field. At that time, a Meta spokesperson confirmed the plan, stating that the chip would work in conjunction with hundreds of thousands of GPUs Meta had acquired.

With the intensification of the AI race, high-performance AI chips are becoming increasingly scarce. On January 18th, Meta CEO Mark Zuckerberg announced plans to build its own AGI (Artificial General Intelligence), aiming to obtain about 350,000 H100 GPUs from Nvidia by the end of this year. Even with the lowest retail price of the star chip H100 at $25,000, Meta would still spend about $8.75 billion for 350,000 H100 units.

Of course, Meta is not the only tech giant turning its attention to in-house chips. Just days ago, Google announced that it is manufacturing custom CPUs based on the ARM architecture, named “Axion,” intended to support services like YouTube ads on Google Cloud, and is expected to be released later in 2024. Previously, Microsoft and Amazon have also begun developing custom chips capable of handling AI tasks.

Analysts from market research firm CFRA stated that these large tech companies are facing cost pressures and need to rely on in-house chips to alleviate them. Although these chips are “necessary” for the companies, they may not match the performance of Nvidia’s latest Blackwell platform products.