Amazon Bedrock Uses Automated Reasoning to Fight AI Hallucinations

Sep 24, 20256 Mins read105

The issue of generative AI hallucinations is a major pain point that makes people hesitant to trust AI. In response, Amazon Web Services (AWS) is making the automated reasoning technology long used in its core cloud services available to customers as a feature within Amazon Bedrock Guardrails. The Automated Reasoning checks feature for Amazon Bedrock Guardrails is now generally available, alongside five new capabilities. This feature uses a formal verification mechanism based on mathematics and logic, helping customers improve the accuracy of detecting and verifying model outputs based on their domain knowledge. It achieves up to 99% accuracy in AI response verification, effectively reducing the risks associated with AI hallucinations.

This method is fundamentally different from probabilistic reasoning approaches. Probabilistic methods handle uncertainty by assigning probabilities to outcomes, whereas the Automated Reasoning checks feature transforms AI outputs into logically verifiable propositions, helping enterprises embed mathematical rigor into their AI application safeguards. When model outputs have multiple interpretations, the Automated Reasoning checks feature can also assist in detecting ambiguity. Building upon the preview version, the generally available version adds several capabilities: support for processing documents up to 80K tokens long, saving and reusing verification tests, automatic test scenario generation, natural language feedback for policy optimization suggestions, and allowing customers to customize confidence thresholds. These enhancements move automated reasoning checks from conceptual exploration to scalable implementation, adding a layer of AI trustworthiness at the business level.

A Weapon Forged Over a Decade, Now Available for Customer Applications

Hallucinations are a key issue causing unreliable output from large language models (LLMs). Enterprise customers desire more deterministic results: firstly, correctness in grammatical and factual expression, meaning no hallucinations; secondly, accuracy in business expression, meaning they want the LLM to truly understand the business context and output content that aligns with business logic. A common method to reduce hallucinations is to pass the entire original document as prompts and context to the LLM. This approach often consumes a large number of tokens, incurring high costs, and also tests the model’s ability to accurately extract information from vast amounts of content, posing risks to both cost and accuracy.

Augmenting model capabilities with rules is a viable method for productizing LLMs. The Automated Reasoning checks feature acts like a “logic supervisor for LLMs,” adding a layer of logical validation before the model outputs, further enhancing the trustworthiness of AI-generated content. The underlying technical principle is Symbolic AI. The core idea is to abstract various descriptions from the human world into logical expressions, then use strict logical control to ensure that the AI’s output or automatically generated content conforms to actual results. AWS has used automated reasoning technology for over a decade in core services like Amazon S3 and Amazon IAM to verify code correctness, optimize performance, and shorten iteration cycles. For example, in Amazon S3 buckets, where customers need strict access control, and in Amazon VPC, which involves extensive network connections and permission controls, automated reasoning is implemented behind the scenes.

Through Amazon Bedrock Guardrails, AWS is opening up this tool to customers for the first time, providing out-of-the-box AI safety and compliance capabilities, further mitigating LLM hallucination issues in practical business scenarios. The newly launched Automated Reasoning checks feature in Amazon Bedrock Guardrails adds five key characteristics:

(1) Handles Large Documents: Supports processing large documents in a single build, up to 80K tokens, easily handling massive document materials equivalent to up to 100 pages.
(2) Simplifies Policy Validation Process: Allows saving verification tests and running them repeatedly, facilitating policy maintenance and validation over time, giving policy validation engineering attributes similar to “regression testing.”
(3) Automatic Scenario Generation: Automatically creates test scenarios based on customer definitions, lowering the barrier to entry, saving time and effort, and aiding in more comprehensive scenario coverage.
(4) Enhanced Policy Feedback: Provides suggestions for policy changes in natural language, simplifying the policy optimization process and enabling developers and compliance personnel who are not logic experts to get started quickly.
(5) Customizable Verification Settings: Allows adjustment of confidence score thresholds based on specific needs, giving enterprises more flexible control over the strictness of verification.

The introduction of these new features signifies that the Automated Reasoning checks capability has evolved from a “proof-of-concept tool” for specialized domains into an engineered capability applicable at scale and in a standardized manner, moving from the lab to production environments.

Judging if AI Assistant Responses Comply with Rules, Precisely Identifying Contradictions

After uploading a policy document written in natural language to the Amazon Bedrock Guardrails module, the system uses automated reasoning to transform the natural language expressions into symbolic logical expressions. It automatically extracts a series of rules and variables, performs symbolic processing, and combines them into rules. This process involves a complete logical chain in the console: users create a policy under “Automated Reasoning,” input a name and description, and upload the rule document; the system automatically generates a logical structure composed of Rules, Variables, and Custom Types, which is applied in the final safeguarding step of Amazon Bedrock Guardrails.

These rules serve to validate the output of the LLM. The LLM itself is unaware of internal enterprise rules and generates content based on probability. Adding this layer of logical validation allows capturing the model’s output and determining whether it is Valid, Invalid, or Satisfiable. Rules define the logical relationships between variables and have unique IDs for traceability. Variables extract key concepts from the text, such as down payment ratio or credit score. Custom Types are used to define value ranges, for example, distinguishing between “insured loans” and “conventional loans.”

In the “Tests” section, users can first use “Auto-generate scenarios” to quickly obtain comprehensive test cases, then supplement with manual tests, set expectations (Valid, Invalid, Satisfiable) for each case, and also set confidence thresholds. The initial process of abstracting rules is automated, requiring no manual intervention, making it highly efficient. Since this process might potentially contain hallucinations, the system provides a customization interface allowing domain experts to modify and revise these rules to ensure logical correctness. Once the rules are modified, they can be deployed.

When enterprise employees actually interact with the LLM, these rules act as a filter to check the model’s output. This allows for better control over the model output, avoiding errors in business logic. After running the verification, the system can not only judge whether the AI assistant’s response complies with the approval rules but also precisely identify the rule causing the contradiction in case of failure, helping users optimize policies or correct tests.

After validation is complete, a single Guardrail can have up to two automated reasoning policies attached and can work alongside other safeguarding mechanisms like content filtering and contextual grounding checks, forming a multi-layered security assurance covering logic, content, and context. In practical applications, the Automated Reasoning checks feature integrates into business processes. The module can be used alongside misuse prevention models or applied independently, sending the output results of any model to Guardrails for control. These safeguards are not only applicable to models on Amazon Bedrock but can also be extended to third-party models via APIs, and can work with Strands Agents and Agents based on Amazon Bedrock AgentCore, also functioning in multi-agent collaboration scenarios.

Optimizing Utility Outage Management Systems, Ensuring Compliant and Reliable Enterprise AI Deployment

According to the latest news, AWS demonstrated the practical application of the Amazon Bedrock Guardrails Automated Reasoning checks feature using a mortgage approval example in an official blog post. In the example, the user simply uploads the mortgage approval rule document, and the system converts it into logical definitions, automatically generating test scenarios. The user can then supplement with manual tests, set expected results for each use case, and run the verification. When the output doesn’t align with the rules, the system can accurately locate the point of contradiction, helping the user adjust the policy.

After validation, these policies can be directly applied within Guardrails to constrain the AI assistant’s responses. This example shows how the Automated Reasoning checks feature can transform everyday business rules into verifiable logic and, through automated testing and continuous verification mechanisms, ensure that AI output consistently meets compliance and business requirements. AWS has also co-developed a solution with PwC. Leveraging Automated Reasoning checks, utility companies can optimize operations in the following ways:

(1) Automated Protocol Generation: Create standardized processes that comply with regulatory requirements.
(2) Real-time Plan Validation: Ensure emergency response plans adhere to established policies.
(3) Structured Workflow Construction: Develop severity-based graded workflows with clear response objectives.

The core of this solution lies in combining intelligent policy management with optimized response protocols, using Automated Reasoning checks technology to evaluate AI-generated responses. If a response is found to be invalid or has satisfiability issues, the results from the automated reasoning check are used to optimize, refine, or directly regenerate the answer. This solution exemplifies how AI can transform traditional utility operations, combining mathematical precision with practical needs to make them more efficient, reliable, and responsive to customer demands. The Amazon Bedrock Guardrails Automated Reasoning checks feature is now generally available in the US East (Ohio, N. Virginia), US West (Oregon), and Europe (Frankfurt, Ireland, Paris) regions, with billing based on processed text volume.

Conclusion: Adding a “Mathematical Lock” to the “AI Guardrails”

For over a decade, AWS has pioneered the use of automated reasoning technology in core cloud services like Amazon S3, Amazon IAM, and encryption engines, using mathematical and logical methodologies to verify system correctness. This experience has been a crucial factor in ensuring the security and reliability of complex, large-scale cloud services. Most AI safety measures on the market rely on filtering or probability thresholds, making it difficult to provide deterministic guarantees. The Amazon Bedrock Guardrails Automated Reasoning checks feature is the first to offer logically provable auditing capability. It means AI safety no longer relies solely on probability and empirical judgment but adds mathematically logical verifiability, moving further from “trustworthy” towards “provable.” This is equivalent to adding a “mathematical lock” to AI, further enhancing its reliability. It allows enterprises to logically verify whether AI output complies with policies and rules, helping to avoid factual errors caused by “hallucinations.”