ChatGPT Rival Emerges: Long-Text Understanding Accuracy Exceeds 99%

Mar 06, 20243 Mins read478

On March 4th, local time, the American artificial intelligence startup Anthropic released its latest large model series, Claude 3. This series includes three versions: Claude 3 Opus (Work), Claude 3 Sonnet (Sonnet), and Claude 3 Haiku (Haiku). Among them, Claude 3 Opus is Anthropic’s most powerful new model, outperforming OpenAI’s GPT-4 and Google’s Gemini Ultra large models in industry benchmark tests. The startup has raised $5 billion in the past year, with a total of approximately $7.3 billion in funding, founded by former senior members of OpenAI and backed by Google and Amazon.

Summarizes 150,000 words, with a long-text contextual understanding accuracy exceeding 99%

Anthropic stated that the Claude 3 series is currently the fastest and most powerful artificial intelligence model on the market, setting new industry benchmarks in reasoning, mathematics, programming, multilingual understanding, and vision.

In particular, Claude 3 Opus is Anthropic’s most powerful new model, outperforming GPT-4 and Gemini Ultra in industry benchmark tests, especially in areas such as the Large Multitask Language Understanding dataset (MMLU), graduate-level Google Propositional Question Answering benchmarks (GPQA), Mathematical Evaluation Suite (GSM8K), and Programming Multilingual Testing (HumanEval), surpassing GPT-4 and Gemini.

*Performance of Claude 3 model compared to peer large models in industry benchmark tests*

Anthropic published comparison data on its official website showing the performance of the Claude 3 model compared to other models on multiple performance benchmarks. The data shows that on MMLU, Claude 3 Opus scored 86.8%, while GPT-4 scored 86.4%. Some differences are more significant, such as in the Programming Multilingual Testing, where Claude 3 Opus scored 84.9% compared to GPT-4’s 67%, indicating that Claude 3 Opus is more user-friendly for beginners learning programming.

Furthermore, Claude 3 can summarize up to 150,000 words, while ChatGPT can only summarize approximately 3,000 words. Users can input large datasets and request Claude 3 to summarize them in the form of memos, letters, or stories. This capability enables Claude 3 to outperform ChatGPT in handling long texts. In particular, Claude 3 Opus achieves a long-text contextual understanding accuracy of over 99%, “in some cases, even identifying which phrases are artificially inserted into the original text,” as mentioned on Anthropic’s official website.

*Accuracy of Claude 3 Opus in understanding long texts*

Claude 3 Haiku can read a data-intensive research paper on arXiv (a website collecting preprints of physics, mathematics, computer science, and biology papers) within 3 seconds, accompanied by charts and graphs.

Claude 3's powerful visual capabilities — *Claude 3’s powerful visual capabilities*

It is worth mentioning that, compared to previous generations of models, this is also the first time Anthropic has provided multimodal support, allowing users to upload images, documents, charts, and other types of unstructured data for analysis and answers. However, Claude 3 cannot generate images.

However, Anthropic also stated on its official website that although Claude 3 has made progress in related metrics such as biological knowledge, network knowledge, and autonomy compared to previously released large models, it still remains at Artificial Intelligence Security Level 2 (ASL-2). The conclusion drawn by the Claude Red Team evaluation is that the likelihood of these models posing catastrophic risks is very low, but the company will continue to monitor future models.

Anthropic declined to disclose how long it took to train Claude 3 or how much money was spent. Currently, Claude 3 Opus and Claude 3 Sonnet are available in 159 countries worldwide, and users can access them on Claude.ai. Claude 3 Haiku will also be available to the public soon.

Daniela Amodei, President of Anthropic, stated that if customers need to handle the most complex cognitive tasks, such as accurately analyzing complex financial data, they would choose Claude 3 Opus, despite its higher price. According to Reuters, Claude 3 Opus charges $15 per million tokens of input. In comparison, OpenAI charges $10 per million tokens of input for its GPT-4 Turbo model. Sonnet and Haiku are cheaper than Claude 3 Opus.

Founded by former senior members of OpenAI, raising $7.3 billion in the past year

Anthropic is an American artificial intelligence startup founded by Daniela Amodei and her brother Dario Amodei, former senior members of OpenAI, in 2021. Dario Amodei served as Vice President of Research at OpenAI.

According to Latest reports, a group of researchers led by Dario Amodei, one of the founders of Anthropic, left OpenAI due to differences in the direction of OpenAI’s development. They were concerned that Microsoft’s investment in OpenAI would lead it down a more commercial path, deviating from its initial focus on advanced artificial intelligence safety.

Dario Amodei’s LinkedIn profile includes his roles as the former Vice President of Research at OpenAI and a Senior Research Scientist at Google. At OpenAI, Dario Amodei worked from 2016 to 2020, overseeing the development of the company’s GPT-2 and GPT-3 language models.

Daniela Amodei, prior to founding Anthropic, served as a Risk Manager at Stripe, overseeing operations, user policies, and underwriting. Later, she became the Vice President of Safety and Policy at OpenAI, playing a crucial role in ensuring the safe and ethical use of artificial intelligence technology.

Anthropic, a startup, positions its products as safer alternatives to ChatGPT. Over the past year, Anthropic has completed five rounds of financing totaling approximately $7.3 billion. According to the Financial Times, Google invested around $300 million in Anthropic in February 2023. Amazon invested $4 billion in Anthropic in September 2023 as part of a strategic partnership.

In a press release, Amazon stated its plans to incorporate Anthropic’s AI technology into its products and services, with Anthropic relying on Amazon Web Services as its primary cloud service and assisting Amazon in developing its custom AI chips. Amazon mentioned that it would receive “minority equity” in the AI startup as part of the deal, but did not provide further details.