So I come from healthcare. I worked as a data scientist for a health services research organization for about ten years prior to coming to SAS. That's Allie DeLonay. Like she said, she works at SAS, a software company based in North Carolina. She's telling me about a project she worked on in the healthcare field. And one of the things that we would do is create models to predict whether patients who were discharged from the hospital would return to the hospital in, say, three to six months. We looked at the difference between the predicted value versus what the actual value was. But a person has to evaluate the predictions, right? Maybe they find something odd in the results. Perhaps the model is predicting things differently for one group versus another. If you're able to see that the model is performing differently for those groups, then there's a process you have to go through to really understand whether these differences are acceptable or unacceptable. Those unacceptable differences could refer to the B-word: bias. Model training is where bias can be introduced. But how does that happen? And what are technologists doing to address bias while training a model? This is Compiler, an original podcast from Red Hat. I'm Kim Huang. I'm Johan Philippine. And I'm Angela Andrews. On this show, we go beyond the buzzwords and jargon and simplify tech topics. We're figuring out how people are working artificial intelligence into their lives. Today we're talking about training A.I. models. Let's get started. A.I. training is a huge topic, so we're clearly not going to cover everything, right? Right. Really not all of it. No, no. This is a very big podcast. It is very big. We can put a dent in it. No, this is really a big area. We're just going to take a little slice of it and talk about it today. Absolutely. For this story, I spoke to two people: Allie DeLonay, who you heard from at the top of the episode, and Sophia Rowland, a senior product manager at SAS. Before working as a product manager, I worked as a data scientist helping organizations on their analytical journeys. Working alongside various groups, I started to notice a trend where they would have their models or their analytical assets and then get to a point where they ask, "Well, what's next? What's now?" That's what really got me on the path for MLOps, to help folks take their analytical assets and get them into a place and form where they can be used to help make better decisions. Okay, so wait, we have DevOps, right? We get that. And DevSecOps. And DevSecOps, yes. But what is MLOps? Beats me. MLOps, also known as Machine Learning Operations, is the process by which we get our models into a place and form where they can be used for decision-making. Models are analytical assets; they need proper management and care to ensure they remain productive and performant and can continue to be used to make effective decisions. We get all that? I think so. Yeah, they're important. They're very important, yes. Machine learning models, as assets, need proper care and maintenance. They are developed on patterns in the world around us, and the world is ever-changing. When the patterns that our models are based on start to change, the model becomes less effective over time, meaning that it is not making the right decisions as often, leading organizations to do the wrong thing. It's important that your models are monitored over time to ensure they remain effective. So they're trained, and then that training data is housed separately, only brought back if there is some kind of remedial need for it, right? You have to monitor the model to make sure it's still performing as it should. If over time things change and the model stays stagnant, then it's not as effective. Is that clear? I think so, yeah. Crystal. I want to ask more questions about training, but according to Sophia, most people are not training these huge models themselves. It's a big endeavor; it's expensive and time-consuming. A lot of folks aren't going into building these models hoping to harm others. Often, they don't understand the A.I. system as a whole. They may not understand how their data is coming in or the issues that might be in that data if they aren't explicitly looking for it. It's also about how the A.I. system is being used to make decisions. Does it have the potential to have a disparate impact? Does it have the potential to deny resources from one group or another? When your A.I. system has this potential to cause harm or have a disparate impact, it requires more thought about how all those pieces are connecting. More analysis is needed to ensure that you are doing your due diligence to have a system that can act more responsibly. And at the end, you need to understand that sometimes things do go awry. But how do we mitigate and make things right should something happen? That was a mouthful. As I listened to her, I thought about the responsibility of the users of said model. It feels to me that there are a lot of ethical concerns, security concerns, and potential issues when using A.I. in any application. Things can go wrong. Yes. But how do we remediate? How do we fix it? How do we know that it's jumping off the tracks? What are some of those signals? This is an amazing conversation we're going to have. What she said really had me asking a lot of those questions and coming up with those bullet points. Just to add to that, what struck me about what she just said is that a lot of people using these models might not know what the data is. It's pulling from a whole bunch of different sources. They might not have the training or knowledge to know if the data being used is biased in some way. In that case, they won't even know if there is a problem at all. Yeah. A lot of these are large language models. Because it's so expensive and time-consuming to train these models, entire organizations and teams are involved. Many people are consolidating around different models. They're all using them, but that introduces the issue you mentioned, Johan, where they are adopting a model without knowing what went into its training. Then they're using it for whatever solution or product they're trying to build. And it doesn't always turn out well. One thing I learned in this episode is that it's not necessarily malicious when we think about A.I. in the news. A lot of the bias and issues introduced into algorithms seem to stem from malicious intent. However, what's important here is that, as Sophia said, people aren't going into these models and training them with the intention of harming others. They're just trying to do their job or perform a task as requested by their bosses or companies. Sometimes, harm does happen. I have one comment about that. You're talking about large language models that are out in the wild. People are utilizing them without knowing their data sources and lacking that inherent knowledge. If it were something you were building in-house, depending on your use case, would it still be problematic? I think hallucinations are always going to be a factor, but bias—if it's biased toward one response versus another—if we're talking about a code LLM, I'm just trying to be the devil's advocate here. Does it happen across all LLMs? Something to think about. That's something I think about. We're going to discuss domain expertise and domain-specific models later in the episode. But I want to point out something important: the training data itself. There are ways to handle more sensitive training data, right? Just take it out. Mm hmm? Yeah. Just take it out. You don't need it. It's like a recipe; you can remove it. You're going to house the model somewhere anyway, and you'll only refer to it when you need to remediate or adjust the model. However, sometimes datasets we don't think of as dangerous or harmful can present a problem too. This has less to do with technology and more about the world we live in. Allie chimes in about proxies. What is a proxy? A proxy is basically data or information that isn't inherently malicious, but it can still push A.I. models into troubling predictions and outcomes. It's something that has correlation with another dataset. I'll let Allie present an example. One commonly known proxy for race, at least in the United States, is zip code. Due to the history of redlining in our country, there are very high correlations with different zip codes. Even if we say that we don't want to include race in our models, if you include zip code, it ends up being a very high predictor and is highly correlated with race. And so, of course, again, oftentimes what happens with these models that cause harm, nobody is going into model building with, you know, with cruel intentions. They're not trying to create malicious models. It's just that these things end up happening. I'm going to let that sit, because that's one of those moments where I was like, thanks, I hate it. When she said that, it was probably one of those moments during the season of A.I. that I was kind of stunned into silence. I didn't even think about that. But what she's saying is totally believable and makes sense. It's a huge issue in data science more generally and in academia. Anyone who works with large datasets has to end up looking at these correlations and trying to think of all the ways in which you're trying to control for something like race, making sure that it's excluded from your calculations. You need to really be careful of the other ways in which it can come out, like through zip codes because of the history of redlining. There's a whole list of commonly known proxies to look out for and to control for. But with these models and the sheer amount of data that they are using to make their calculations, it just makes it so much more difficult to keep track of them all, number one, and identify them when they do come up. That's why I was a little bit hesitant when you mentioned earlier before the clip, like, okay, you just remove that sensitive data, right? But you need to keep that around at least as a way to check the results to ensure that those proxies aren't having an impact that you don't want them to. Yeah, and we're going to talk about that later, actually. It's going to come up later on in the conversation. Allie said something else that was really interesting to me—something that I had never considered: the idea of fairness in a model not being static but dynamic. The thing that becomes really interesting with monitoring models over time is that there are many different metrics that you can use to measure the fairness of a model, and they sometimes conflict with one another, which can be confusing for someone who has never created a fairness metric monitoring plan. It honestly just depends on the use case in terms of which fairness metric you choose. But, of course, again, it's incredibly important because it's possible that when you first developed your model, it was acting in a quote-unquote fair way based on the metric and the threshold that you set. But then over time, the world around us changes, and there can be ways that the model is ultimately implemented that we didn't even think about, which could actually impact how the model is used in the field and what that then means for the fairness of that model. Yeah, the idea of fairness in a model can change over time because the world is not static; it's also changing. Interesting. Which I didn't think about. Did she give us an example, or can you think of a way to show how that could possibly happen? I'm having trouble remembering. Yeah, I don't really have an example offhand, but I'm thinking about what Angela said about use cases in different domains. Maybe regulatory changes could be one, because there's definitely a point in history where a law didn't exist, and then all of a sudden it does. Okay. Think about, you know, the UK leaving the EU. Think about these massive changes that affect multiple datasets in multiple places across maybe even multiple industries that all kind of have to connect with each other in some way. It's like a point in time where there's a big change, and that's going to always affect everything. So, like census data, you know, before and after a law is passed, I think those are really high-level but good examples of a spot in our real world where we make a decision—maybe the royal "we" as in society—and then from then on, there's a whole different dataset, or maybe even that dataset that existed before changes into something else. Especially if, after the early beginnings of said model, that model gets retrained based on the passage of time and things that are happening. That dynamic input of worlds changing, lives changing, social mores, ethics—everything is changing over time. That's... thanks for explaining that because things that were fair at one time, depending on what you're checking for, can be very unfair or slightly less fair depending on the passage of time. But that's interesting. I never thought that a model's fairness could change. It's breathing; it's exhaling. It's like a living thing to the point where you have to realize that. I never... it's the first time I'm hearing this, so... Yeah, and having to monitor it. Yes. The conversation I had with Sophia and Allie reminded me of something we spoke with another guest about, Emily Fox. She said something to me that I have never forgotten since: people will more often than not choose efficiency over everything else, even if that efficiency could harm others. It's not that a person is evil; they're just human. Alright, Angela, Johan, I'm going to make a joke to lighten the mood. Do either of you drive? Right? Everybody's a driver here. We all drive cars. Okay. Tell the truth: do you speed when you drive? Uh oh. I plead the fifth. I have a speedometer. Do you look at it when you're driving? I ride motorcycles. Oh, yeah, you do. Well, alright, well... okay, I'll... I will not confirm or deny. I believe in radical honesty. I speed. Say it with your whole chest. I speed. I inherited it from my mother. I speed. Okay. Okay, lead foot. Yes. But when you're speeding, right, you're driving. You understand that you're speeding. You understand there's signs posted. In the context of driving that car, you know that you're going over the speed limit that other people have decided is the safe speed that you can drive at. But you're not thinking about the sign; you're thinking about, I need to get to where I'm going as fast as possible. Right? Agreed. I have to go to the bathroom. Right. So I'm going to tell you a story. When I see people driving like weirdos and cutting in and out of traffic and just speeding through, you know what I say? They need to go to the bathroom? They probably have to go to the bathroom. That is me being very empathetic to why one person would do this with all these other people around. I know it sounds really weird. It sounds really weird. It's a generous read of the situation. It is. That helps keep your blood pressure down. That's a good way to look at it. Yeah. But I feel like this analogy is very appropriate for this episode and for a lot of the interviews that we've done around this season. People aren't malicious or evil; they're just trying to do something as efficiently and as quickly as possible. But sometimes, when you're choosing efficiency over regulation and ethics, that could harm others. You acting efficiently could lead to situations where there's bias introduced or there is some kind of harmful outcome or prediction. By the time an LLM gets in front of our listeners, though, to Sophia's earlier point, it may have already gone through training, or maybe they're thinking about training a smaller model of their own. What can be done to surmount challenges when things go wrong? We'll discuss it after the break. Where we left off, we were wondering what can be done to address bias introduced in model training. First, I'm going to bring Sophia back to talk about the data you have and the data you don't. We can see if our model is making a different prediction for one group over another. But to be able to do that, we actually still need the data about these different protected classes and groups. We know to run these tests against these different groups, even if we don't want them necessarily included in the model, it's still useful information to have. But then that becomes another set of data aspects that we're going to have to be protective of. Some folks are very cautious about providing that information because they are a little worried about having that information used to discriminate against them, even if that is one of the best ways for us to understand if our model is discriminating. Because if we don't know who is male and who is female, we cannot tell if our model is more accurate on male or female until after it's used in production, until after it's used to make a decision, and someone starts to notice something feels off and reports that when it's already being used. In order to correct an issue, you need to have the information behind it. Makes sense, right? Yes. Yeah, but people are hesitant to come forth with the data needed to correct models when they make mistakes. Yeah, and it's interesting. I was just reading a book about if you're in an environment—I'll say a work environment—where you don't have that type of safety, where you can't be open and honest, where all mistakes are these learning exercises and it's not about pointing fingers, it sounds like that. It's like, oh, I'm not going to... I'm not getting involved with that. I don't want someone to call me racist. Like, I don't know, we're not causing the problem; we don't want to put salt on it. So I am understanding this. But like you said, how do you know that it's doing what you don't want it to do unless you have something to compare it to? Correct. Yeah. That just introduces another challenge. Not a solution, but... Another challenge. Yeah. As if it weren't challenging enough. I know. Oh, gosh, I know. But here's a possible answer, and it has to do with domain knowledge—something very important. But how... Do we balance that with A.I.? Well, people in certain disciplines are turning to SLMs, smaller language models, as well as human oversight with people in the know—experts in the domain the model is serving. Focus instead on having these models that are more like point solutions. I'm asking about a specific area, like healthcare, so I'm going to need to use the data that I have available. With these smaller language models, though, you still need to be aware of privacy concerns. They can perform better on these more domain-specific activities because that's just what's in their purview. And they also can be a lot smaller in size. So if you're looking at the infrastructure cost of hosting the language model, someone has to pay for it at the end of the day. You can save costs by using a smaller language model if it has the ability to answer the questions that you need it to. So back to one of our earlier episodes this season, if you don't need a model that can tell you 12 jokes about whether you can cross the road or give you eight recipes for chili, if you have something very specific that you want it to do, you can build a smaller model. It'll be a lot cheaper to build and maintain. You can save costs, and it can perform better on those very domain-specific activities, kind of what you were saying earlier, Angela. Cool. Nothing to add? No, I just... I mean, you said it. It doesn't matter really the size in the long run; you still have to manage it. You still have to maintain it. You still have to care for it. The rules don't change other than you are paring down the amount of information, and you're not feeding it chili recipes when you only care about healthcare issues. And Allie drives home the importance of that human touch. Having multi-stakeholder engagement becomes so important because if you don't have, in this case, a physician or some other professional that is on the ground, they know what's happening kind of in the real world. If they were not able to explain this to you as a data scientist, where, you know, I don't have the clinical expertise that they do, I wouldn't even know to think about these things. So domain expertise makes A.I. models more specialized and less susceptible to answering unhelpfully. These are some really creative ways to mitigate these problems. Finally, oh, sorry... Oh, I was just going to say, and it sounds like also having a domain expert, like a group who actually knows what they're doing and knows what to look for, might be able to help you again identify those issues that might pop up in the data, even if it is a small language model. Right? But then you can take action to mitigate any bias that is in that model. They know what to look for. Exactly. Yes. So there's three prongs to this solution. Right? So we have small language models, we have that human in the loop or that domain expertise, that expert. Finally, there's something new being introduced as the marketplace for A.I. expands—model cards. So I'll let Allie explain. We have something in our platform called a model card. Essentially, what this model card is, is a nutrition label for an A.I. model. The goal is to try to give you the information that you need to know about a model; however, we try to do it in a way that's as easy to understand as possible because we don't assume that you are a data scientist looking at this nutrition label, right? Like we want it to be as ubiquitous as a real nutrition label. And so I bring this up to say we have a section on that model card that will explicitly pop up those variables to let you know that, "Hope, just so you know, you accidentally included a potentially sensitive or proxy variable in your model." So, you know, if that's the technology that a data scientist is using, then certainly that will help. Yeah. Nutrition facts for an A.I. model. Yes. You don't want to get your 100% daily allowance of bias in any model. That was good. That was really good. I wish I came up with that, but I didn't. Well, both Allie and Sophia emphasized the importance of questioning what predictions and results we see coming from A.I. models after they've been trained. We need to remove the blind assumption that A.I. systems are always right. As users, I think we should be a lot more comfortable reporting problems when they occur so that it can be addressed and fixed in a future iteration. Whenever you see something that looks off, that doesn't sound right, do investigate and report that back to the individuals who have built and maintain that A.I. system so that they can have an understanding of what they need to fix. I think what we need to make sure that we're doing to build more responsible A.I. systems, more trustworthy A.I. systems, is ensuring that we're adding fail-safes and places for individuals and humans to ask questions along the way. I think that's a great idea. What is it we've been saying? Take the pledge. Yes. Don't just blindly trust the A.I. models... Trust but verify. And also the ability to report problems. That's that feedback loop that the data scientists and the people who are building and training these models need. But if they want it from their end users, they have to make it easy. They have to make it simple. Again, no one who's using a model, especially if you use it all the time, should have to always trust and verify. If you start to see those things creep in, you may not be as diligent if you see it all the time, but at the very least, make it easy for that feedback to get back to who needs to see it. Yeah. And make it easy to understand. Right? Don't make anything opaque. Make things as transparent as possible. Actually, I'll let Allie come back and say it better than I can. It's our responsibility as technologists, as developers, to make sure that these systems are as transparent as possible. And when I say transparent, I don't mean, you know, you get your medications, you open up the pamphlet, and there are five million words on that thing that you can't read anything about. Right? It's very, very easy to give a lot of detail. The hard thing is giving a small amount of information that is very impactful and easy to understand. I think that is going to be a very critical piece of this entire sort of A.I. evolution that we're seeing—just really encouraging transparency no matter what. Well, that's huge. It's huge for the people who are building and training, having transparency into the data that they're using and dealing with. And then as the end users, the people who are using this model, the domain experts who are also translating this model, use the healthcare analogy. If we're trying to cure something and this model is returning information for us to make health, life, and death decisions for people, we need that level of transparency as to why it is making this decision. Who do we go to? There's a lot involved from everyone in this chain, in this value stream that has a part to play. Like you said, making it easy, making that small amount of information actually be impactful, so that people don't have to scratch their heads and start Googling what this means, is a lot, especially if you're asking for it from people, and if you want these people to be responsible. Well said. That's a great point. Johan, what do you think? Yeah, I mean, I agree. And again, there's this conflict of wanting to provide as much information as possible so that the person is informed as possible about where the model is getting its information from. There's also the readability issue, right? It's got to be something that you can understand. It's got to be something that the user can actually digest and make a decision on without having to spend however much time reading through all that material. Because, as we were talking about earlier, people want to be efficient. They want to go quickly. They want to make the best decision they can in the smallest amount of time that they can. Making that information available is fantastic. Making it easy to read, I think, is the real challenge. Absolutely. I agree. I think gathering from what I've spoken with Sophia and Allie about training A.I. models, it's a large undertaking. But knowing what the model will do, knowing its limitations and the limitations of A.I. in general, and knowing that how things are not intended as harmful could still produce not-so-great results, I think those practices are vital to keeping models on a more responsible track. Mm hmm. Indeed. Well, we've done it again. During this A.I. series, we have uncovered so many gems and things that people aren't really considering or talking about. This is one thing that, again, I've never heard of—predictive versus actual values and how models are trained in such a way that there is nothing going into it. But inherently, over time, things change dynamically, as does the world around us. And I'm hoping you're enjoying listening to this series. I hope you enjoyed this episode as much as I did. You have to tell us what you thought, share your stories, hit us up on our socials at Red Hat. Don't forget to use the hashtag #compilerpodcast. What did you think about this episode? What more do you want to hear from us too? You can use that for the hashtag as well. We would love to hear it. And that does it for this episode of Compiler. This episode was written by Kim Huang. Victoria Lawton's nutritional label reads: High in positivity, 100% brilliance. It sure does. Thank you to our guests Sophia Rowland and Allie DeLonay. Compiler is produced by the team at Red Hat with technical support from Dialect. Our theme song was composed by Mary Ancheta. If you like today's episode, please follow the show, rate the show, and leave a review. Share it with someone you know. It really helps us out. All right, everybody. All right. Take care now. Bye.