The news late last year about Red Hat's acquisition of Chatterbox Labs is just one part of how we plan to accelerate trusted AI for the enterprise. In the age of generative AI, having a transparent, flexible, and reliable platform for innovation is more critical than ever. And of course, Red Hat believes the open source development model is the most effective path to deliver on that promise.
Recently, the Amazon AGI Labs team published a paper, Integrating Safety Testing into GenAI Development: Lessons from Amazon Nova and Chatterbox. This paper documents a collaboration between Amazon Nova's Responsible AI team and Chatterbox Labs (now part of Red Hat), describing how specialized external testing capabilities helped strengthen specific aspects of Nova's safety evaluation during development, particularly for adversarial prompt scenarios.
Important context: The testing approach described focuses specifically on assessing model responses to adversarial prompts (jailbreak attempts—where users try to manipulate models into bypassing their safety guidelines) and refusal behavior for policy-violating requests. This represents one component of Amazon's broader responsible AI framework, which includes multiple layers of safety measures, ongoing monitoring, and human oversight.
Key learnings from the Amazon Nova and Chatterbox Labs paper
The paper details how the teams implemented an evaluation loop where Chatterbox Labs’ testing capabilities, AIMI for gen AI, were integrated earlier in the model development cycle. This enabled assessment of early model versions, allowing more time and opportunities to mitigate identified risks through subsequent model builds.
The operational and technical insights from this collaboration underscore why Red Hat is enthusiastic about the opportunity to accelerate trusted AI:
1. Effectiveness of Progressive Attack Escalation
The AIMI software implements a technique called Progressive Attack Escalation. This is a rigorous stress test in which a prompt is progressively modified until the model is either successfully manipulated (jailbroken) to produce a policy-violating response or the software exhausts the jailbreak mutations.
This testing approach was a critical insight from the collaboration. By systematically escalating attack sophistication, it provided stronger evidence that improvements measured during testing corresponded to more robust safety behavior in real-world scenarios. While no testing regime can guarantee comprehensive safety across all possible use cases, this method helped validate that observed improvements were meaningful rather than artifacts of narrow test conditions.
2. The Nuance of responsible AI performance
A key finding from the collaboration was that, for responsible AI, monotonic improvements in model performance are not guaranteed as capabilities scale. Performance must be carefully calibrated during training to ensure models refuse genuinely harmful queries without leading to over-refusals. For example, a model should decline to provide instructions for illegal weapons but still answer academic questions about chemistry or historical military technology. Finding this balance requires extensive testing across diverse scenarios, with human judgment guiding what constitutes appropriate boundaries. Another important insight was that strength in one aspect of model usage may not translate to strength in another.
3. Operational Integrity: Building stronger safety testing practices
The report also yielded operational learnings for building more resilient models:
- Insulation from testing: Since the goal was to improve the models during training, the teams followed best practices from security research by maintaining separation between test development and model training. The Chatterbox team maintained a distinct test set that wasn't shared with model developers, preventing inadvertent overfitting to specific test cases and ensuring the models developed robust, generalizable safety behaviors rather than memorizing specific examples. The teams collaborated on high-level risk categories and safety objectives while Chatterbox Labs maintained test diversity and specificity—similar to how security researchers conduct penetration testing without revealing exact attack vectors in advance.
- Joint analysis: Periodic syncs between scientists from both teams enabled the extraction of key insights, improving the quality of both testing and model training. Through iterative testing cycles, the teams identified several risk areas where early model versions required additional safeguards, leading to targeted improvements in refusal behavior and multi-turn conversation safety. While we cannot disclose specific attack vectors for security reasons, the testing revealed both strengths and areas for continued development.
Trust is the foundation of hybrid cloud AI
The collaboration demonstrates the value of specialized external testing expertise, having capabilities separate from the teams building models helps identify potential safety and security issues that might otherwise be overlooked. Automated, customized data integration allows for direct flow into training workstreams, enabling early risk identification and mitigation.
This kind of rigorous, specialized testing informed Amazon Nova's development and reflects important practices for enterprise AI development. Red Hat is committed to an open approach to AI that offers our customers the freedom to run any model, with any accelerator, on any cloud. The integration of Chatterbox Labs' advanced testing capabilities into Red Hat AI will empower organizations to build, deploy, and manage their AI models with a clearer understanding of potential risks and a stronger posture against them.
We will integrate this expertise to advance AI safety standards, particularly for multi-turn conversations and agentic frameworks. With Chatterbox Labs' proven expertise and capabilities, Red Hat is accelerating our mission to deliver an open, trusted, and consistent foundation for the next wave of enterprise AI innovation.
→ Read the full paper, Integrating Safety Testing into GenAI Development: Lessons from Amazon Nova and Chatterbox
저자 소개
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies.
Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments. Award-winning support, training, and consulting services make Red Hat a trusted adviser to the Fortune 500. As a strategic partner to cloud providers, system integrators, application vendors, customers, and open source communities, Red Hat can help organizations prepare for the digital future.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래