How open source can help with AI transparency

Technically Speaking with Chris Wright

00:00 — Chris Wright
AI has trust issues and the lack of transparency threatens to undermine its potential benefits. Just as Linux and Containers initially faced skepticism, they eventually won trust through open source principles, community engagement, standardization, and demonstrating reliability. So could a similar path exist for AI?

00:21 — Title Animation

00:29 — Chris Wright
The open source software that we all know and love is pretty well established. We get full access to the source code written by programmers and a license that allows users to modify, enhance, and even redistribute the software according to their needs. This foundation of openness is what builds trust and drives innovation.

00:49 — Richard Fontana 
Open source software definitionally requires you have the source code be available to your software. What is the analog to that for AI? That is not clear. Some people believe very strongly that that means your training data has to be open. That means you have to provide the training data. That would actually be highly impractical for pretty much any LLM. So if that's the answer, it raises some difficult problems for open source and AI because it suggests that at this state of the game, open source AI may not be practical or possible. It may be a sort of utopian thing that we have to aim towards. And I think that's the view that some people have.

01:29 — Chris Wright
When I think about open source AI, it starts with the model and extends to the software stack. The core components include an open data set, weights, and open source licensing of the resulting models so users can modify and share. In terms of the software stack, the majority of the software stack is already produced in open source and we can certainly imagine a future with an entirely open stack. Now, that may be a bit of a utopian view, but we are currently taking steps to prioritize our efforts in reproducibility and transparency.

02:04 — Richard Fontana 
What has been going on over the past few years is that machine learning practitioners and companies have commendably been releasing models to the public. We see the term open source used indiscriminately for any public release of a model, no matter how restrictive the license is. Many of the licenses that are being applied to public models that are being described as open source do discriminate. They discriminate against persons and groups. They discriminate against fields of endeavor. And yet people are calling them open source. So that's part of the situation we're in today.

02:44 — Chris Wright
But open source also conveys the sense that there's a community of contributors behind it, creating a system of checks and balances.

02:53 — JJ Ashgar 
So when you look at InstructLab, it's important to open source because the core value and the core draw to it is that it is a workflow that works on your laptop and can work in your data center. And it is a bunch of Python code that can build that workflow for you to get your downstream model, to have the fine tuning it is needed to do it. Okay, it's Apache 2.0 license. Sure, you can take it and build a proprietary system off of it, but there's no real value prop to doing that because we are building this in the open for the greater good of society.

03:32 — Chris Wright
Community involvement in AI model development ensures multiple validations, enhancing trust. InstructLab exemplifies this by enabling open source collaborations that refine AI models.

03:45 — JJ Ashgar 
What is the old saying? The four eye rule of development? Where you need at least four eyes before you hit that merge button. That means any knowledge that we're putting into the InstructLab ecosystem means there's four people saying, "Yes, this is okay."

04:02 — Chris Wright
As we work toward creating more trustworthy models through community involvement, it's equally important to know what data goes into the model and understand the decision-making processes. Which algorithm was used? Which data points were input? And how those data points were processed to get the final result? This is important because we want AI to make decisions that will have very real impacts on people's lives.

04:28 — Rob Geada 
One useful question to ask yourself is, what would happen if my model was wrong every time? Let's say you are deploying a model that predicts something like a loan acceptance rate in applicants and it takes in a bunch of information about the people like where they live, their demographic information, what they do for a job, et cetera. And from that it predicts whether or not you should give them a loan based on how likely they are to pay it back or something like that. Now, what you might want to do when you've deployed that model is monitor how biased it is against different values of, say, that demographic information that it's receiving.

05:02 — Rob Geada 
You might notice a certain skew in how likely your model predicts that applicants of a certain race will pay back their loan. And odds are that's going to be unacceptable for you. And so you need to be able to see this information and understand how your model operates over, say, these different demographic groupings and understand how likely it is to, say, give positive outcomes to all these different demographic groupings. To make sure that your model is operating fairly, that it's treating all of your customers with equal opportunity.

05:37 — Chris Wright
We need to understand how the system arrived at a particular decision or prediction and whether the decision it's making is fair and aligned with our values.

05:47 — Rob Geada 
TrustyAI is an open source set of responsible AI tools. Things like bias monitoring, explainability and model guard railing and language model evaluation scripts. A whole bunch of responsible AI tools to integrate into AI workflows. With the idea being if we can align it as closely as possible with AI deployments, we can make it as easy as possible to be safe, to be responsible, to be ethical with your AI. The analogy I like is that of a seatbelt. It's great that you have a seatbelt in your car, you have to, but it means nothing if you don't wear it. And that's the idea with responsible AI is you need to make use of those tools and know what you will do with the information that they provide you.

06:36 — Chris Wright
Today's efforts in making AI more open and transparent are just the beginning. Just like Linux and Containers revolutionized their fields through open standards and community engagement, AI too can follow a similar path. The journey to trustworthy AI is complex and filled with challenges, but it's a journey worth taking. By learning from the past and embracing open source principles, we can pave the way for a future where AI is not only powerful, but also trusted and transparent. Thanks for watching. We'll see you next time.

  • Keywords:
  • AI,
  • ML
JJ Ashgar

JJ Ashgar

IBM Developer Advocate, InstructLab contributor

Richard Fontana

Richard Fontana

Red Hat Senior Commercial Counsel, founder of Red Hat's Technology and Open Source legal team

Rob Geada

Rob Geada

Red Hat OpenShift AI Principal Engineer and TrustyAI Tech Lead

Keep exploring

TrustyAI - an open source project looking to solve AI’s bias

The rush to put AI in place without always knowing what it can be used for, or how to use it, can lead to problems. As organizations look at the opportunities for AI ahead, they must weigh both the opportunities and the risks. This is a challenge that open source project TrustyAI looks to address.

Read more

Open source and AI’s future: The importance of democratization, sustainability and trust

Today, many people are looking at AI/ML as a future technology, but how can we solve the technology puzzles that haven’t materialized yet? Luckily, we’ve got a time-tested way to help us plan for and create the future: Open source communities and projects.

Read the blog

More like this

Technically Speaking with Chris Wright

Building Trust in Enterprise AI

Delve into the critical importance of trustworthy, open AI systems and learn how InstructLab evolves LLMs for enterprise operations.

Technically Speaking with Chris Wright

Building a Foundation for AI Models

To realize the power of AI/ML in enterprise environments, users need an inference engine to run on their hardware. Two open toolkits from Intel do precisely that.

Code Comments

Bringing Deep Learning to Enterprise Applications

Like houseplants, machine learning models require some attention to thrive. That's where MLOps and ML pipelines come in.

Share our shows

We are working hard to bring you new stories, ideas, and insights. Reach out to us on social media, use our show hashtags, and follow us for updates and announcements.

Presented by Red Hat

Sharing knowledge has defined us from the beginning–ever since co-founder Marc Ewing became known as “the helpful guy in the red hat.” Head over to the Red Hat Blog for expert insights and epic stories from the world of enterprise tech.