AI trust through open collaboration: A new chapter for responsible innovation

2 mars 2026Red Hat3 minutes (temps de lecture)

The news late last year about Red Hat's acquisition of Chatterbox Labs is just one part of how we plan to accelerate trusted AI for the enterprise. In the age of generative AI, having a transparent, flexible, and reliable platform for innovation is more critical than ever. And of course, Red Hat believes the open source development model is the most effective path to deliver on that promise.

Recently, the Amazon AGI Labs team published a paper, Integrating Safety Testing into GenAI Development: Lessons from Amazon Nova and Chatterbox. This paper documents a collaboration between Amazon Nova's Responsible AI team and Chatterbox Labs (now part of Red Hat), describing how specialized external testing capabilities helped strengthen specific aspects of Nova's safety evaluation during development, particularly for adversarial prompt scenarios.

Important context: The testing approach described focuses specifically on assessing model responses to adversarial prompts (jailbreak attempts—where users try to manipulate models into bypassing their safety guidelines) and refusal behavior for policy-violating requests. This represents one component of Amazon's broader responsible AI framework, which includes multiple layers of safety measures, ongoing monitoring, and human oversight.

AI security 101: Read the article

Key learnings from the Amazon Nova and Chatterbox Labs paper

The paper details how the teams implemented an evaluation loop where Chatterbox Labs’ testing capabilities, AIMI for gen AI, were integrated earlier in the model development cycle. This enabled assessment of early model versions, allowing more time and opportunities to mitigate identified risks through subsequent model builds.

The operational and technical insights from this collaboration underscore why Red Hat is enthusiastic about the opportunity to accelerate trusted AI:

1. Effectiveness of Progressive Attack Escalation

The AIMI software implements a technique called Progressive Attack Escalation. This is a rigorous stress test in which a prompt is progressively modified until the model is either successfully manipulated (jailbroken) to produce a policy-violating response or the software exhausts the jailbreak mutations.

This testing approach was a critical insight from the collaboration. By systematically escalating attack sophistication, it provided stronger evidence that improvements measured during testing corresponded to more robust safety behavior in real-world scenarios. While no testing regime can guarantee comprehensive safety across all possible use cases, this method helped validate that observed improvements were meaningful rather than artifacts of narrow test conditions.

2. The Nuance of responsible AI performance

A key finding from the collaboration was that, for responsible AI, monotonic improvements in model performance are not guaranteed as capabilities scale. Performance must be carefully calibrated during training to ensure models refuse genuinely harmful queries without leading to over-refusals. For example, a model should decline to provide instructions for illegal weapons but still answer academic questions about chemistry or historical military technology. Finding this balance requires extensive testing across diverse scenarios, with human judgment guiding what constitutes appropriate boundaries. Another important insight was that strength in one aspect of model usage may not translate to strength in another.

3. Operational Integrity: Building stronger safety testing practices

The report also yielded operational learnings for building more resilient models:

Insulation from testing: Since the goal was to improve the models during training, the teams followed best practices from security research by maintaining separation between test development and model training. The Chatterbox team maintained a distinct test set that wasn't shared with model developers, preventing inadvertent overfitting to specific test cases and ensuring the models developed robust, generalizable safety behaviors rather than memorizing specific examples. The teams collaborated on high-level risk categories and safety objectives while Chatterbox Labs maintained test diversity and specificity—similar to how security researchers conduct penetration testing without revealing exact attack vectors in advance.
Joint analysis: Periodic syncs between scientists from both teams enabled the extraction of key insights, improving the quality of both testing and model training. Through iterative testing cycles, the teams identified several risk areas where early model versions required additional safeguards, leading to targeted improvements in refusal behavior and multi-turn conversation safety. While we cannot disclose specific attack vectors for security reasons, the testing revealed both strengths and areas for continued development.

Trust is the foundation of hybrid cloud AI

The collaboration demonstrates the value of specialized external testing expertise, having capabilities separate from the teams building models helps identify potential safety and security issues that might otherwise be overlooked. Automated, customized data integration allows for direct flow into training workstreams, enabling early risk identification and mitigation.

This kind of rigorous, specialized testing informed Amazon Nova's development and reflects important practices for enterprise AI development. Red Hat is committed to an open approach to AI that offers our customers the freedom to run any model, with any accelerator, on any cloud. The integration of Chatterbox Labs' advanced testing capabilities into Red Hat AI will empower organizations to build, deploy, and manage their AI models with a clearer understanding of potential risks and a stronger posture against them.

We will integrate this expertise to advance AI safety standards, particularly for multi-turn conversations and agentic frameworks. With Chatterbox Labs' proven expertise and capabilities, Red Hat is accelerating our mission to deliver an open, trusted, and consistent foundation for the next wave of enterprise AI innovation.

→ Read the full paper, Integrating Safety Testing into GenAI Development: Lessons from Amazon Nova and Chatterbox

À propos de l'auteur

Red Hat

The world’s leading provider of enterprise open source software solutions

Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies.

Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future.

Read full bio