Data Security 101

  |  Compiler Team   Sicherheit

Compiler • • Data Security 101 | Compiler

Data Security 101 | Compiler

About the episode

They say "data is king." From secret recipes to performance metrics and beyond, organizations use mountains of data every day. It's important to keep that data safe from scammers, the competition, or anyone else who could misuse it. Securing that data isn't easy.

Clarence Clayton, Director of Global Privacy + AI Risk and Compliance at Red Hat, lays out the foundations of data security. He covers what needs to be protected and explains some of the basic principles you should follow to keep data thieves out of your database.

Compiler team Red Hat original show

Abonnieren

Subscribe here:

Listen on Apple Podcasts Listen on Spotify Subscribe via RSS Feed

Transkript

Yeah, it kind of sounds like a heist movie. It's like you take it from this vault, it's in the armored car, and then it's in a different vault. And like, they all have kind of their own methods of defense and their own patterns of attack. Correct. Ocean's... I don't know, what number we're on now? One of those. Ocean's 28? Something like that. This is Compiler and original podcast from Red Hat. I'm your host, Emily Bock, a Senior Product Manager at Red Hat. And I'm Vincent Danen, Red Hat's Vice President of Product Security. On this show we go beyond the buzzwords and jargon and simplify tech topics. And in this episode, we delve into the depths of data security. We've been talking about securing products and keeping malicious actors out of your stack. But what is it you're trying to protect? A big part of security is keeping data secret and safe. We're going to cover what that data could be, why it's worth protecting, and some of the basic policies that will build the foundations for securing an important resource. We spoke with Clarence Clayton, who leads a team of AI risk and compliance professionals here at Red Hat, and he explained what kinds of data organizations keep and why it's such a big target. Oh, and it's like the crown jewel to, you know, how a company operates. When you think of competitive intelligence, when you think of, you know, the secrets of like, you know how Coca-Cola's recipe is made or KFC's, you know, herbs and spices, like, the data is what is, you know, powering all of that. I mean, those are very, very broad examples. But companies every day... the data, it's sort of the lifeblood of their organization. And it's what allows them to be able to, you know, make money, service their customers and other stakeholders that they have, you know, obligations to. And so there's an obligation also and a duty to keep that data safe and secure. So it's really an organization's most precious asset in many ways. Secret recipes are definitely a category. What other kinds of information are we talking about here? I mean, it depends on the... when the space that you're talking about, right? So for our health care provider, probably the most important information for them is that their patient's medical records. Mhm. When you're looking at financial information, right. From a bank or any retail store, their list of their customers credit cards, maybe not so much a brick and mortar type store, but an online store where they have recurring payments, think subscriptions, like they have to retain your credit information. Absolutely. And I think you hit on two of like the most in need of security types of data. But not all data is really created equal when it comes to that. Like there's a world of data outside of you know, personally identifying things or financial or health types of information. Does this include, like, everything else, too? I mean, there's some data that probably doesn't matter. I mean, I'll give a perfect example. For Red Hat, everything that we produce is open source. Mhm So exfiltrating our source code. Yeah. Would be a very expensive things to do when you could just go to GitHub and get it for free. Exactly. Like, you don't have to work too hard to get that. Now if you're a proprietary software vendor, exfiltrating, you know, like I know looking at my PlayStation, the source code for my PlayStation. Right. That would probably be a big deal. That's closer to your secret recipe kind of a thing there. If it's something that you need to protect or someone else can copy. Right. I think that makes sense. And I imagine there's going to be different levels of security needed depending on the value of said data. Totally. Absolutely. I mean, like all things, not all data is created equal. Exactly. Doesn't all need to be in Fort Knox. But, you know, there might be a few different levels there. Correct. And it's just really understanding what those levels are. So you understand the criticality of your data. What's important? What's not. Exactly. And I think a big part of that is what happens if it gets leaked. So like what is that worst case scenario. What happens when data leaks. That's when you go into janitor mode. You going to clean it up. Right. Because at that point you have an incident. Something happened that shouldn't have happened. And now how do you deal with it. There might be ways where you can reclaim it. And there might be ways where you can't. Right. I mean once it kind of hits, you know, a stereotypical dark web, like there's no getting it back. Everyone go, go get a new credit card. Right. Like it's done. Not that I'm advocating for paying ransoms. Right. But like in the case of ransomware. Right. You could get your data back if you paid a ridiculous amount in Bitcoin, perhaps. Exactly. So I think that ties into then the level of security is tied to the consequences of it getting out. Yes. And also, you know, the level of out it gets. Yes. So if I can put my old school system administrator hat on, this is where backups are so important. Yes. Absolutely. 100%. Back up your data right now. Well, you can finish the episode and then go back up your data. There you go. So that's some good context about what needs to be protected. How do companies keep their data safe? So Clarence shared some the basic policies companies can take to protect their data. So there are many different ways that they go. One of the biggest ones that comes to mind is, is when you think about, you know, encryption, encryption at every layer, encryption at rest, encryption in transit. Encryption at the web layer. Encryption at the database layer. That is going to make it more difficult, certainly to exploit that data or do anything with, an environment if it should get compromised, if you don't have the keys, even if you got to the house, you can't get in the door, so to speak. When you think about it from a... another perspective is around what we call, data minimization. So it's making sure that you're only processing the data that you need to do the job for as long as you need it. If the data is not there, then it can't be exploited. So you need to, you know, process and have the data you need to be successful and to, you know, meet your commitments. But once that data is, you know, no longer useful, just having it sit there, allows it to, you know, collect dust, so to speak. And it can, you know, create some additional risk to the company as well. All right. There's a lot of good stuff there. So I'm going to break it out into a couple, a couple sections. First of all let's talk a little bit about encryption. How does it work? It is a secret decoder ring. For those who are old enough to know what they are. Jokes aside, it's basically taking your data and using a lot of very fancy math that is way beyond my ability to explain. It uses cryptography like this math to change the data to a format where you require a key to decrypt it. Right. So basically, I can't think of a good analogy at the moment, but it's like it's something that's hiding in plain sight in some way, right? When you're looking at data moving between your browser and a website, that you can see that there's a stream of data, but it's just a bunch of mangled characters that are completely illogical on their own unless you have the key to decrypt it on either end. Yeah. So the Matrix code, that's Red Hat, that's a blonde. The Matrix. Yes. Exactly. No. You're speaking my language because I had a whole big cryptography part of my life, too. But the way I see encryption is along that scale. You know, there's the very basic one where A is actually a B, so on and so forth. There's all the, you know. Right-13. Exactly. You can do all the puzzles that way. And then it gets a little further where like you shift it even more and then you go all the way through to like, you have to know what specific book it's referencing in order to get the keys. And I think that that's kind of the level of encryption, except rather than being like a book, it's a whole bunch of math in a secret place. Yes. Exactly. Yes. It's just it's black magic somewhere. You know, it's... I don't understand it. I don't pretend to. I use it, but I don't understand it. It's a "trust me, bro". Exactly. Totally. Yeah. So he also talked a little bit about the different states of data. So there's data in the web layer versus at rest versus in transit. How does that affect how you deal with it? How you can keep it safe? Yeah. I mean, when you're looking at data in transit, for example, which I think Clarence was referring to as the web layer as well, like from a browser to a, to a web server on either end it's unencrypted and it's that transmission that encrypts it so that somebody who's lurking on the network, listening to traffic, you know, sniffing traffic, can't actually decode or see what it is, is being transferred. So you think about... you're going to, your banking website, you're putting in your password, your, you know, debit card number or whatever that you log in with. Those transactions can't be seen by somebody who's resident on the network. Only you can see it. Right. So that's the encryption in transit. It protects the data as you're moving it. Yeah. The encryption at rest is typically where would you'd have like your, your laptop or folder on your, on your server where it's encrypted on the disk itself. Mhm. And that was actually really interesting because we have a lot of requests for you know we need these things with data at rest. I always think like what are the what's the thing that you're trying to protect when you're using data at rest. Because in order to use that data it has to be decrypted. So as soon as you boot that laptop, you put that computer, all that data is unencrypted so that you can use it. Right. In your day to day work. And then when you, when you shut it off then it's, it's all encrypted. It's cold storage in a way. Right. So your threat there is somebody stealing your laptop, turning it on and getting access to your data, or stealing a machine from your data center to do that. Now one's going to be maybe easy. Like, I think my laptop gets stolen pretty simply at an airport or a Starbucks or whatever. Somebody's backing up a truck to a data center and sort of holding up racks of machines is probably a little bit different. A bigger endeavor, for sure. Correct, correct. And so and that's where you're looking at the data use, you know, do I encrypted at rest for a certain type of application or a certain type of system? For sure. Yeah. It kind of sounds like a heist movie. It's like you take it from this vault, it's in the armored car, and then it's in a different vault. And like, they all have kind of their own methods of defense and their own patterns of attack. Correct. Ocean's. I don't know what number we're on now. One of those. Ocean's 28 or. Something like that. So he also talked a little bit about the lifetime of data and how long you keep it. So depending on what kind of data it is, how long do we keep it? I mean, I think that basically boils down to regulations. Right. And the I think Clarence noted it as well, the usefulness of the data. If I have a database full of credit cards that have expired, that's useless information. It doesn't serve me. I mean... I even an attacker really can't do anything with it other than maybe get some PII. Yeah. Kind of. Right. Like, neat. This once existed. Yeah. Right. But maybe I know that person's name. And maybe there's an email address associated with it. So the credit card information itself no longer matters, but some of the other information might or might be a useful, stepping stone to some other sort of data that I might be interested. Yeah. Gap in the armor. Right. But you're not keeping it for those email addresses. You'd be keeping it for those credit cards that you wouldn't be using anyways, so why keep them? Right. So, I mean, there should be a shelf life on, data in terms of its utility. There also might be, times specified by regulations, like you can't keep per se identifiable information for a person who hasn't visited your website for two years or whatever. Right. At that point it's like, yeah, you should totally be getting rid of that information. Right. Yeah. So there's a difference between allowed to still have it and care to still have it. Yeah. And it's like we don't have to hoard this stuff. I mean I know that data storage is cheap right. Like hard drives don't cost as much as they did when I got my first two and a half gig hard drive for 750 bucks. I still remember this. Mind you, that's Canadian. So probably cheaper in the US. But I mean, even then, right? Like, data storage is so cheap. But that doesn't mean we should be keeping the stuff because we can. Yeah. This data does not spark joy. I will let it... End it's adventure with me. Right. If it's not useful, why do you have it? Exactly. I just should tell me that about all the stuff in my apartment. Well, that's for you to figure out. Yeah. Fair. Fair. All right, so we need to be aware of where the data is, which affects how it needs to be protected. And encryption is a valuable tool. So thinking about the life of the data is important as a last bit. Another aspect to keep in mind is the life cycle of the data policies themselves. So it's one thing to know the basic policies for data security. But those policies change depending on where the data lives and Clarence is here to explain how that's changed over the years. And the cloud is such a big part of, you know, how we operate. I'm going on a quick tangent, but I was telling some students one day, about life before, Google Workspace or life before Office 365 and how we used to have floppy disks that we would, you know, carry around and even before USB, you know, flash drives and things like that and they just couldn't even fathom, fathom what? You know, what a world like that, you know, was. But believe me, it actually did exist and I existed in it. So I say that to say, you know, the cloud is awesome because, you know, it does reduce the, I think the, the infrastructure footprint that companies have to, you know, sort of self-manage. But look, there are situations where it makes sense to run something on prem as it is as opposed to in the cloud. So thinking about from a data residency perspective, where does it make the most sense to, you know, run the data and where do you get the, you know, greatest security, right. And maybe that could be in the cloud or maybe, you know, your environment is so secure that it makes sense to really hard in it, you know, on prem to, you know, keep you know, people out. So that's just a consideration. You have to weigh the pros and cons and the, you know, risks and reward of both and cost and make the best decision there. There's a world of difference between the so-called sneaker net and the cloud for sure. How does that change security policy? Well, I think that, in a couple ways, right? I mean, the first is accessibility of data. If it's in the cloud, by its very nature, it's all probably accessible, in theory, to anybody on the planet, if it's in your do your data center or, like me, in my house, this is except accessible to me. And nobody else. Right. So I mean, that's one of the things that would change, from a, from a location perspective. And then the other thing that I would think to as well is, the skills that you have to secure it in the first place. I mean, if you're, if you're a company that doesn't have a lot of good systems administrators with the security mindset, you might prefer to have your stuff in the cloud because there's somebody else managing that for you. Yeah. Right. I mean not that you can get away. We know about all the leaky S3 buckets and stuff. There's still some configuration that needs to be done, but by and large, like that cloud provider is managing the security of the thing, presuming you configure it correctly versus me having to set up all the things myself for my data set. Yeah, like you can on some level outsource some of the expertise needed to keep it safe if you don't already have it. And that's exactly what you're doing. Yeah, exactly. And there's a real literal physical access question when it comes to on prem and the cloud as well. Like, yes, if it were a heist, there's a difference between backing a van up to a building and, you know, just hacking something on the internet. Well, and that's the thing, right? When you're looking at a lot of these, these cloud providers, I mean, these aren't small mom and pop operations. They employ a lot of people, a lot of expertise. Because it's not just my data, it's not just your data. It's like hundreds of thousands of people's data. Exactly. That they have to protect. So they're invested in making sure that it's as robust and reliable as possible. Exactly, exactly. It's like a bank with everyone's money versus, you know, whatever money you have in your house. Yeah. My piggy bank. Different levels of expertise. Correct. Yeah. For me. You just need a hammer for a bank. You need to be a little more sophisticated. Exactly. Exactly. The same is kind of true of data, if you think about it. But that raises the perennial question, you know, where do you serve- Where do you store your data? In the cloud or on prem. They have differences and pros and cons and security level. So what do you put where? I mean I think it comes down to risk tolerance, preference and expertise. But you may have data that is so sensitive that you wouldn't trust it to the cloud. Yeah. But you better have the expertise to make sure that this so sensitive data is protected on premise. Right. So it's not just a simple like is on prem more secure than the cloud or vice versa. There's a couple different factors that have to come into play there. And so you really have to kind of think through again like what was the worst that can happen and can I support this. Yeah. So like pros and cons for each like cloud, it seems the pros are pretty easy to set up, you can outsource some of the expertise needed and it's pretty easily accessible if it's something you need kind of on a regular basis. But it might be a little bit more accessible than something like firmly on prem. You might not have like total control over the servers themselves so you might be subject to, you know, leaks that aren't necessarily your fault. Does that kind of set up the cloud world? Yeah. I mean, you're trusting, your data and somebody else's data center and the people who physically operate that. Right. So, I mean, there's, an implicit trust in the organization that you're giving this to beyond just the cloud access. And we've seen this with, security camera devices in the past. Right. Where people at lunchtime were watching other people's camera feeds. Exactly. I mean, that's a little scary, right? I mean, if you want to talk about personal data, I mean, a camera pointing into my living room. Pretty personal. Not what you want. Not what you want. But then on the flip side, you're also looking at cost. Yeah. Right. Because this comes with cost. Now, depending on the size or the amount of data that you have, which kind of goes back to the early discussion of like how much data do you keep. Right. That cost might be a determining factor. It might be cheaper for you to have it on prem because you have a mountain of data. Yeah. Or it might be fairly inexpensive to have it in the cloud because you, have appropriate policies and you trim what you don't need, and it's a manageable amount. I think there's also an element of, like redundancy when it comes to the cloud, too, like it's harder for a server to go out and you lose all your data, I suppose. Yeah. Your backups aren't as urgent as on the cloud because they're doing the that's part of the service. They're backing that stuff up for you. Presumably. Again. Like, we trust, we trust that they do it. I haven't really heard of too many instances where data got lost in a cloud provider, but I mean, again, I'll put my tinfoil hat on and they'll still have a local backup on prem of that stuff that's out there in the cloud. Your chances are low, but never zero. Yeah. So then on the other side, for like the on prem aspect, the pros, they've got control. It's harder to access in general, which means typically more secure some security of some expertise notwithstanding. What else am I forgetting? They're kind of the. From the pro perspective? I mean you have full control over it. Right. Like you can do whatever if you want to turn it off, you want to move it like you can do that very very easily. Yeah. Gotcha. And then the cons are also probably a little bit more vulnerable to losing data. Power goes out, a flood or something. If you haven't backed it up specifically yourself somewhere you might be out of luck. And depending on how robust your backups are, how often you need to do it. I mean, if you're looking at, I don't know, we'll pick on the stock exchange for a second. I mean, if you're not backing that thing up like every second. Yeah, there's very expensive data that you're missing if your backup was an hour. Exactly. You imagine the amount of trades that happen in an hour. Like, I can't even fathom the number. I think that's just a pro and con of the control aspect in and of itself. Like you've got total control, but you've got total control. And total responsibility. Exactly. Right. You're not outsourcing any of that. You have to have the expertise and the capability to protect the data yourself. There you go. So when we're talking about how data is transmitted and where it rests, that's changed a lot with that introduction of the cloud. And those changes affect the security policies needed to keep that data safe. How much data do you actually need? Earlier we talked about considering how long to keep data around and Clarence also pointed out that organizations should be very thoughtful about what data they're collecting in the first place. And so what again, we talk about often is that data minimization, that deleting data when you no longer need it and even beyond that. Only collecting the data that you need to achieve the desired business outcome. So if I'm just doing a transaction with you and send you a marketing newsletter, I don't need to ask your gender. I don't need to ask your blood type. I don't need to ask your social security number. Those if those pieces of information are, you know, not necessarily, you know, relevant. Processing them unnecessarily, you know, bumps up against, you know, privacy laws and best practices. And not only that, it really raises the risk to companies because now you're processing more sensitive information than you would maybe otherwise need to in your course of business. I think this is a really important point, because if I'm going to go buy a donut, I'm not going to bring my filing cabinet full of all of my financial information over all time. I'm going to bring a credit card like... There's something to be said for collecting only the data that you need in a certain case. So do you have any advice or thoughts or strategies around identifying how much data you need for something? I mean, it's hard. Again, it's very context dependent. But I think the thing that I like seeing nowadays is we are less cavalier with data than we used to be. I mean, there used to be websites that I would, you know, attempted to sign up for and they started asking things of like, you really don't need to know that. And so I wandered away. Right. Nowadays we tend to ask for less information or only asking for it when you, when you need it. For example, I like sites where it's like I'm going there for a trial basis and I don't have to provide a credit card until I've decided I actually want to sign up. The ones I want my credit card before, you know, they're giving me a 30 day trial but at the outset, they want my credit card less interested. Maybe I'll pass and go find something else. Yeah. Right. So, I mean, when it comes to how much data do you need? It's very dependent on your business. The only thing I would say is like only only collect what you really need, like, have those data retention policies and those data collection policies that state we need this data and articulate internally for what reason. Absolutely. I think that's critical. And you mentioned something in there too, around raising the business risk. Like does having too much data actually cause risk to your business? I think so. I think if you have more data than you need, and say there's potentially sensitive information and you suffer an incident, a breach of some sort, that data gets exfiltrated or exposed. I mean, your liability goes higher. Right. Because this is information that you were entrusted with that you mishandled, for lack of a better term and now it's out there and somebody has to go sign up for some credit reporting that they didn't intend to or... I mean again, depending on what the data is. The outcomes could be severe. Absolutely. I think it goes kind of all the way from might be spending more on storage than you need to all the way through to like real critical legal risks or damage to your reputation or the trustworthiness of your relationship with your customers. And you know, there there's consequences to actions. And I think that's worth taking into account before you ever start collecting data. Well, that and trust is the biggest currency that any company has. And I mean breaches happen. Okay. I mean and companies survive it. Right. But if given a number of alternatives, no one's going to sign up for the least trustworthy. If there's no alternatives they might. Right. Because they may have to. But if there's options I'm looking for the one that's the most responsible. Exactly, like you can. You can survive your house burning down. That doesn't mean you ever want it to happen. I think that's, kind of the consensus of the approach on, data leaks too. Don't want them to happen. Even in the best case scenario. Yep. So talked about identifying what kind of data you need, how it can raise your business risk to collect more than that. What about the volume of data like? Does the volume of data matter from a hardening standpoint? I don't know if it's the volume itself that matters because again, that's just disk space. I mean, at the end of the day, it's all disk space, right? And if it's encrypted at rest, it's like, okay, the compute time to decrypt it as you're, as you're about to use it or whatever. And I don't think that's like cost prohibitive in that sense. Right. But I think from a, from a hardening perspective, it's not so much the volume that matters as, I guess, the way that you're structuring your data. Yeah. Because if you have data in multiple places that might open you up to more risk, right? You may have a harder time like, oh, I've encrypted all these sources of data, but not these ones. Or I didn't realize that we needed these sources of data and they were being used in some in some fashion with your new organization. Right. Like a good data management policy as well, I don't think we've talked about that, but just managing your data well. Knowing where it is. How it's being used. All of those things. I think those are things that will reduce risk and frankly make things more operationally efficient. Yeah, I like what you said there about you know, where all the data is because I think it's not just collect the least amount of data you can manage, it's also have it in the least number of places that you can manage as well. 100%. Like you... If you're looking at like I have to go fix something or remove something, where are all the places that that thing is stored? Like, I better know. Or else the thing that's kind of get exfiltrated is the thing that I meant to delete all the copies of ten years ago, but I missed that one. Somebody's snooping around and they find it, and then, oops, now I have a problem. Yeah. In a lower stakes kind of version of that. Like, that's something I come across in documentation a lot of the time. If you have something documented, like even just like a simple process in your product documentation, and it's in more than one place, like that's more places you need to update it. It can get out of sync faster. It's harder to find where things are. It's harder to use. So it's minimization. Maybe is the name of the game there. Well that and what mistakes happen. Look when say this is a policy or a process document and all of them but one is been updated. And the one guy who has emergency or regular course of work, they're referencing that outdated one and then they start doing things wrong. Exactly. Well that's I think analogous a little bit to the security aspect to the... In that situation it would be they find the one vulnerability, that's the one that they're going to hammer on. Yep. So any kind of last thoughts? I know we didn't really address the elephant in the room either. I think that's, maybe something that you'll hear in the next episode. But last thoughts on data security in general. And I think we can allow ourselves a little talk about AI here. Yeah. I mean, the AI part is hard. Like the data security part is something that I think people don't think a lot about. Right. Like not to put too fine a point on it but again, when people are looking at vulnerabilities in their estate. Right. We talked about this a little bit in the prior episode. It's people, process and tools. We focus a lot on the software vulnerabilities. And maybe a software vulnerability could expose some data in ways that we don't want it to. But those human problems, those human challenges that have access to that data. And this is where like data privilege comes into play. Becky the secretary doesn't require access to all the company information that HR has. Yeah. Because if Becky gets phished and she's like tick, tick, tick. "I've installed some malware" and now they have access to stuff that she probably shouldn't have had access to in the first place. That's really bad. Exactly. Right. So it's I think you had mentioned it before, the principle of least privilege or the principle of least, at least amount of data that you have. Like has to be useful, has to be sensible. I have to be authorized to use it. It shouldn't be a free for all. These are, these are all part and parcel of it. And I think that that's those sorts of mechanisms would reduce the impact of some of these, you know, human driven breaches significantly. Absolutely. Every access point is a vulnerability and should be treated as such. It's up to all of us to make sure that we're handling data appropriately and keeping it safe. Right. And what mitigations can we put in place proactively to prevent, turning on in full janitor mode and doing cleanup? Yes. No one wants clean up. Let's avoid that. Clean up on aisle six. That's not what I want to hear. Worst day ever. I promised we would only touch on it and I know we've got a lot to say about AI in this space, and we'll save it for the next episode. So join us again. We'll talk all about AI and its implications around data security all the way through. What data can be used for in AI all the way through to self-policing. This episode was written by Johan Philippine and a big thank you to our guest, Clarence Clayton. Compiler is produced by the team at Red Hat with technical support from Dialect. And if you like today's episode, don't keep it to yourself. Follow the show, rate the show, leave a review or share it with someone you know that really needs to have this information. And we'll see you next time.

About the show

Compiler

Do you want to stay on top of tech, but find you’re short on time? Compiler presents perspectives, topics, and insights from the industry—free from jargon and judgment. We want to discover where technology is headed beyond the headlines, and create a place for new IT professionals to learn, grow, and thrive. If you are enjoying the show, let us know, and use #CompilerPodcast to share our episodes.