Customer story

Software as a heritage for humanity: How Ceph will help store and preserve code for future generations

Software Heritage

Software Heritage is a non-profit organization, headquartered in Paris, with the mission of collecting, preserving and sharing all software source code for present and future generations as a heritage for humanity.

In 2021, Software Heritage partnered with the Red Hat Social Innovation Program, committed to the use of open source technologies for the greater good, and started a collaboration around the development of a massively scalable software-defined storage architecture built on Ceph storage.

By building an object data store, Software Heritage is able to take advantage of a cost-effective, highly scalable, distributed system, and with enriched metadata, they can ensure that software repositories will be easily searched and retrieved for use by future generations.

Listen to this informative interview on Software Engineering Radio.


Collecting and preserving all source code is no easy task

Storing all the code of the world in a reliable, scalable, and affordable way requires building one of the largest archives ever created.

Software Heritage aims to not only collect and preserve the world’s code, but also to share it for practical use, and offer it as a reference guide for future generations to inherit and learn from. The open sharing of all code will also help humanity reuse valuable code, providing solid, common foundations to serve the different needs of heritage preservation, science, and industry.

But how can we ensure that all this data is stored with resiliency and organized in a reliable, scalable and affordable way? That’s why, to build its next-generation object storage system, Software Heritage chose Ceph, an open source solution that is scalable, has no vendor lock-in and is highly reliable.


Using Ceph to build a universal software archive

An endeavor of this magnitude needs a reliable, affordable, and scalable solution. For Software Heritage, storing and sharing all of the world’s code is their core mission. Ceph will support this goal by providing industry-leading performance, reliability and flexibility with rapid scalability–minimizing storage costs, and building durable, resilient clusters. 

Software Heritage adopts Ceph to preserve software code for future generations. Video duration: 4:30

Ceph makes working with data at scale a great option for any organization because it runs on any commodity hardware platform, allowing full control over the capabilities and capacity of the cluster. Basic offerings include enhanced search capabilities, as well as native support for file and block storage protocols. Additionally, Ceph is a distributed software-defined storage system, meaning that there are no dedicated controllers that could fail. So, if Software Heritage ever loses a node, the cluster will rebuild itself with no downtime. Plus, storage capacity can easily be increased by adding new nodes to the cluster while the data is live, and Ceph redistributes the load accordingly. 

Best of all, Ceph is cost-effective. With native support for file, block, and object storage, Software Heritage can create and tune storage pools, delivering the performance required for high-latency workloads in addition to an archive object data store. This flexibility gives data engineers the ability to manage costs while concentrating on delivering the service levels required by the organization — all in one solution, with one management plane, providing incremental operational cost savings.

Business outcome

Open source sharing guarantees preservation

Red Hat’s Social Innovation Program provides consulting services and product support to nonprofit-based open source projects that are solving social and environmental issues. Through our joint commitment of using open source technologies for the greater good, Red Hat is partnering with Software Heritage and utilizing Red Hat Consulting services to help design a successful foundation for their shared vision of creating an archival database to track the origin and development of software.

When it comes to implementing such a large-scale project, having a solid architecture is key to building a strong, scalable base, not only to preserve, but to share all the code of the world. This is why Software Heritage engineers worked with Red Hat Ceph experts, including the team that created and supports Ceph, to ensure that the design can support the massive scalability of Software Heritage’s deployment. 

As this project unfolds, Red Hat will continue to contribute resources, helping to validate performance, implement desired features, provide deployment guidance, and consult on the development and operation of Ceph storage. Our continued partnership helps to fulfill Red Hat’s goal of fostering collaboration towards building better open source solutions towards the benefit of all.

Icon-Red_Hat-Media_and_documents-Quotemark_Open-B-Red-RGB Software source code embodies a growing part of our scientific, technical and organizational knowledge. The mission of Software Heritage is to ensure that this precious body of knowledge will be preserved over time and made available to all, improving software development and reuse, fostering better science, and contributing to secure the open source software supply chain.

Roberto Di Cosmo

Founder and Chief Executive Officer

Icon-Red_Hat-Media_and_documents-Quotemark_Open-B-Red-RGB The free software community has built, working together collaboratively over many decades, an immense technical legacy: billions of freely licensed lines of code that everyone can use, study, modify, and share. Software Heritage preserves this software commons for future generations of hackers to build upon and for historians to document the technical side of the open source revolution.

Stefano Zacchiroli

Founder and Chief Technology Officer

Related stories

Powering eco-smart cities in collaboration with Red Hat



Accelerating and simplifying data processing for faster research insight

Red Hat Innovators in the Open

Open source fuels innovation. This fact is exemplified best by Red Hat’s customers, who are using open source technologies to change the game. We’re proud to call them "innovators in the open" and share their stories.