Data-baeses | Compiler: Stack/Unstuck

Stack/Unstuck: Data-baeses

29 de septiembre de 2022 | Compiler Team Análisis de datos

Compiler • • Data-baeses | Compiler: Stack/Unstuck

Data-baeses | Compiler: Stack/Unstuck

About the episode

Writing data is easy. You take in the information and put it away for future use. It’s remembering exactly what you wrote and where you put it that’s the challenge. Just like having to look for your keys as you try to rush out the door, getting that data quickly makes all the difference. And when your database is your bestie, it can serve that information faster than you could imagine.

Getting a database into shape takes specialized skills. From planning and development to maintenance and rebuilding, it’s a layer of the stack that needs constant attention and evaluation. It can be a performance booster—or an efficiency bottleneck. What does it take to keep your database and the information it stores available to the stack?

Compiler team Red Hat original show

Transcripción

I know, but that's interesting because that's one of the reasons I sort of left front end behind because I believe I got frustrated at how quickly things would change and every other week there's a new framework or a new system or a new something. And it just felt like I was in this front end rat race, and it was quite exhausting.

That's Jo Josephs, like you just heard, she used to work on the front end of the stack. It was her way into the tech industry and eventually to another layer of the stack. But it took her a while to get to the work that she truly enjoys.

I started basically in customer support, technical support, so I've always been sort of in the tech world, but also very customer facing. And I was able to build up both skills of being good with people, being good with the tech as well. I got my first sort of coding job a few years later, where I was responsible for the company's suite of websites. And that's sort of where I did a lot of front end work.

She would help clients build our websites so that they could then reach, inform, and serve their own clients. And as we've heard working in the front end can be quite exhausting. Jo was burning out. Now some people thrive with that kind of work, but Jo was looking for something different. A couple of years ago while doing a few different contract jobs, she started working for Blue Cross.

I started doing a lot of data work for them. And I remember using, I forget what internal program they had, but part of my job was just to input information, so part of it was just data entry and then make sure the data wasn't corrupted or duplicated or anything like that. And I remember the commands they gave me in terms of the keyboard actions would be like, oh, if you want to look up a certain client's name, for example, what you don't know their first or last name or whatever, you can put a percentage sign and then whatever else of their name that you do know in between codes, click the Enter button, and it will find all the resulting names that are similar.

And I thought to myself, huh, I'm almost sure I remember this in SQL maybe some years ago in Code Academy or something. So I actually looked it up and I was like, oh, it's actually a sequel command, but it wasn't, you wouldn't know you were in a database. It was very human friendly to input it. But then I was like, oh, this is actual SQL commands that you can use to pull up information. And I think at that point I was like, oh, I should go back to SQL and see if this is something I'd be interested in.

That was a major turning point for Jo and the start of her journey into the database. And these days she's known online as Databae Jo.

I love that name.

That's such a good name. Right? So perfect. Yeah.

So fitting.

This is Compiler, an original podcast from Red Hat. I'm Brent Simoneaux.

And I'm Angela Andrews.

We are taking you on a journey through the software stack. We call the series Stack Unstuck.

If you want to listen from the beginning, you can start from our episode, The Great Stack Debate. Today's episode? Databases.

Producer, Johan Philippine is here to sort us out.

He said sort.

All right, so what is the database? Well, it's a store of information. For example, if you have a shopping app, the database is where you keep track of your products, your inventory, the associated prices, customer data, and many, many other bits of information. Now databases have two primary functions. You have writing the data and then retrieving that data. The sources of the data come from all other parts of the stack. They can come from the front end when it's provided by a user. It can come from the application layer, if it's processed in some way and then sent back to the database. And it can even come from the operating system. Now, writing data is easy. Fast retrieval is where the challenge is for database design, development, and administration.

Huh.

He's right.

I've done my research.

Why is that?

Why is what? Why is retrieval the difficult part?

Yeah.

Well, think of it as if you were filling out your own spreadsheet. Adding in a new piece of information is easy, you just write it to the bottom of wherever you are. But then if you want to recall that piece of information later on, but you don't remember exactly where it is, you have to look through the entire database to figure out where it is. Now, there are some tricks that help speed up this process, and that's really where all the challenge is in building databases.

Oh, okay.

Now there are two different types of databases that handle information in different ways. You have relational databases and nonrelational databases. Relational databases are very structured. They're what you might think of traditionally when thinking of a database.

Okay.

Information is kept for example, in rows and in columns, just like a spreadsheet. The language that's used to set up and manage and manipulate and retrieve data from a relational database is called the structured query language, which is a standard adopted years ago. It's abbreviated as S Q L and often pronounced as SQL.

I feel like we're going to hear that acronym a lot, right?

We are.

Yeah, absolutely. Now that's the brief intro to relational databases. We also have nonrelational databases, which don't have the same kind of rigid structures that relational databases do. They tend to be more efficient for cloud deployments and it's less of a big deal to adjust them or add a column to your data, for example.

They call them unstructured. Like you said, they're just less rigid.

Help me understand this a little bit. So can you give me an example of a relational database?

Well, of course we talked about, well, we used the acronym, but MySQL is a relational database.

Okay.

PostgreSQL is a relational database. Microsoft SQL is a relational database. There is a lot of them out there.

Give me a practical example, too. What would you use a relational database for?

Definitely for storing information in records. So if I had a customer, say I had a customer database and I wanted to keep good records of my customer, so I'll need tons of information about them. First name, last name, address, credit card, number, shopping history, what their likes and their dislikes and all this other stuff. And you want to be able, just like in a spreadsheet, to see and query that information very readily and you can actually then if you had the customer database say you wanted to join it with a purchasing database, and then you could kind of work those pieces of data together to get much more information. So I think the structured databases, it really lends to it being very structured and orderly.

And what about nonrelational databases?

Again, they're just data. You don't have to worry about its schema, you don't have to worry about how you're organizing the data. As long as it has a way that you can kind of track said data, you can dump it in any kind of way you want to. And usually people have a nomenclature as to how they put their information in nonrelational databases, but it's less structured. You can kind of be more freewheeling with the data that you put in.

So let's go back to my example, like the customer database. Now we usually want that so orderly and nice and neat, but say that there is so much information about said customer that maybe you don't want it to be relational. You just want drips and drabs here and there and you can still query it, you can still get information out of it no worries. But the way that you're putting it in one record from the next doesn't have to be the gospel. This record could be very different from the next record could be very different from the next record. It's just more free flowing, I guess.

There's one more thing about the difference between these two and that has to do with scaling.

Okay.

So when an application scales, it's being subjected to more demands and needs more resources and the way that they scale is more efficient for some contexts rather than others. So relational databases tend to be a better choice for what's called scaling up, where you have basically one node, very few users that are accessing the databases, but it needs to access it very quickly. And then you have scaling out, which is for more something on a cloud deployment where you don't necessarily need to be as quick, but you need to have it be available for many, many more users at the same time.

So we've got our foundation here. Databases, functionally, they write data and they retrieve data. And then we have two types of databases, a relational database and a nonrelational database. Right?

That's right.

All right.

One quick note, we started this section by saying that databases are for storing and retrieving information, and I was being a little bit tricky with that because information includes more than the data like storing names or payment information or things like that. It also includes things like stored procedures and bits of code and a whole host of other things that Louis Imershein told me about. He's a principal product manager here at Red Hat and a wealth of information about databases.

Well, I think that the most common thing that developers do with databases is they use them to store structured data to build applications. I create a small application that stores data and retrieves it. There's probably in coding school everyone built a Rolodex. If anyone doesn't remember what a Rolodex was, it was a way that you stored contacts. And so essentially you're building a small contact app. And that for many years was a very common application in coding classes. So you can imagine a nice little front end for that. But it's a very basic skill to be able to store and retrieve data and to figure out how to lay out that data becomes something that becomes better as you become more advanced. So you might lay things out initially some way and iteratively have to change that because you'll discover that performance will be better if you have a different layout of the data.

So I asked about this point, somewhat facetiously, how complicated can it be to store data according to rows and columns? Why would you have to change it around? I mean, aren't databases essentially giant spreadsheets?

You didn't.

How complicated could it be? Johan.

You use the spreadsheet example and how does the spreadsheet work? You mentioned it's got rows and columns. But no, it's also got tabs. And within those tabs, sometimes there are rows that are referenced from other rows that are in other tabs. That is very much like a database. And if you've ever seen a complex spreadsheet, you can imagine how complex a database can get because a database can handle many, many more data points.

So no, they're not essentially big giant spreadsheets. He has dispelled that myth.

Pretty thoroughly and in a much better way than I could, which is why I asked him that question. But yeah, it's complicated. It's really complicated. And it's even more complicated than I knew. Now, as applications get more complex, so does the data that they need to gather and retrieve and they need to do it quickly. Even computers don't instantly know where every bit of data is. So setting up the database properly is a big task, and the requirements of the database may change as the application grows. One of the tricks databases used to speed up retrieval is called indexing.

When you have a database, you have to index that database based on certain fields. And those indexes are stored usually as a separate file either in disk and in memory and it's a fast way to go and locate the entire data record. So think I'm looking up all of the barber shops in the United States named Joe's that have registered with my barbershop supply business. Think about that, that could be an awful lot of barbershops. So you don't want to have to go through the entire list of barbershops in the United States to go find Joe's barbershops.

So indexing from what understand is a way to give the database shortcuts in order to figure out where the pieces of data that you're looking for are placed in the actual database itself.

Okay.

That makes sense. I mean, because like you mentioned earlier, those reads or retrievals can be very expensive and by expensive, I mean the use of processing. So indexing really helps speed that up.

It really helps to speed up performance, but there's often an inclination to index all of the things in the database.

But you have to know what to index. If you index everything, it's going to try to keep those indexes in memory and you have to make a balance between efficiency on the system and efficiency and the benefits you're getting to performance. So much better to have a lot of memory to apply to one big index than have to share that memory between several big indexes.

What this makes me think of is trying to find something in a book. So if you've got, for example, a really big book, and maybe it's a reference book that it's not something you read front to back, it's something that you go to reference for say a particular topic, for example. In every book that there's going to be an index. So you flip to the back of the book and you look for the topic that you want to find in that book, and it will tell you the page numbers, every place that this shows up. And that's going to cut your time significantly, right?

Yeah. That's exactly how it works. To get back to his point though, you don't want to index everything.

Exactly.

And for the students out there, it's like taking notes or highlighting important passages in a textbook or something, or putting sticky notes to bookmark relevant pages. If you copy everything word for word, or if you're highlighting the whole page, rather than just small sections of the passage that are relevant, I mean, you might as well not be highlighting anything at all.

Exactly.

It's almost like you have to have a lot of foresight into what people are going to need. What is going to be useful for someone in the future?

You're right. So you have to think about that. You have to be really mindful when you're developing your database and creating those indexes because you're right. What do you think are the most important topics or pieces of information to pull out? Again, you can't tag everything, but there are some tried and trues and I'm sure the developers or DBAs that are creating these databases, they have an eye for what exactly folks are looking for.

And it's like this balancing act where it's like, if you index too few things, you're not going to be able to find things quickly enough, but if you index too many things, that's also a problem.

Exactly. And finding that balance has a really big effect on performance. But that's also not the only trick in the book. Louis introduced me to the concept of stored procedures, which can really improve performance as well.

When you're retrieving that data, you may actually want to process that data. So for example, you might have a specialized query that goes and request the data based on some, AIML routine. Or even as you're writing in the data, you may want the data to be formatted in multiple ways in the database, and that doesn't mean that you have to write the same data multiple times into the database. Actually a stored procedure can take that data and reprocess it.

That sounds a lot more like something that you do in the application layer than in the database.

It does.

Yeah.

The fact that the database can do it is just mind blowing to me.

What did we say in one of the previous episodes? Work smarter, not harder. I think that's what stored procedures are.

Yeah, they can really help you take some of that load off of the application layer if you need it and also minimize the amount of data that you're keeping in the database itself by processing it as it's in transit. Now a quick recap of what we've gone over so far, we use databases to store and provide information on demand. The more quickly the better.

Yes.

Some of the factors that determine efficiency are how the data's structured, how it's indexed, and to what extent stored procedures are used to process data.

Okay.

Now, a lot of that is in the initial design of the database, but there are some other tasks too. You have to maintain the data, you have to expand the database, and continue developing it as well.

That sounds like a lot for one person.

It is shout out to all the DBAs out there.

Louis kind of walked me through all the different roles, well, not all of the different roles, but some of the different roles that are involved in making databases run well.

So you can really think of three roles. There's a database developer, there's the database architect, and then there's the database administrator. The database developer is someone who develops code for a database, and that can be either stored procedure code, so code that's processing data that the database is storing, or it can be just developing rows and tables laying out data in a database. The data architect is the one, though, who really understands how to do that layout and how you're going to get the best performance, squeeze the best performance out of that database with minimal resources. And then lastly, you have the database administrator and they're the one who's operating the database. They're really like a system administrator, but for the database.

That's a lot going on. But data is king, so these are very important roles.

Yeah, and very high demand at this point. Now a lot of developers only encounter the basics of databases in their classes and only briefly in their work on other parts of the stack. But the database ends up calling to a lot of people.

Well, I mean, there are a fair number of people who start as database developers, but then there also are a number of people who started as application developers and the database is a side job. And maybe then over time, they become more specialized on the database side.

And this is where we get back to Databae Jo. When we left Jo, she had just rediscovered SQL and databases. And since she was pretty much done with the front end, she set out to learn all that she could. She read a lot of things online, she took a lot of online courses, and was learning a lot about data management.

So I decided to basically take a break and just focus on the data analytics course, just because the technologies for that course was so different from front end development. The courses included spreadsheets, databases, Tableau, and the R programming language. And I had never done anything much with any of those tools, except maybe a little with spreadsheets, not anything complicated like pivot tables or VLOOKUPS or whatever.

The more she learned about the databases, the more she grew to like them. And even though a lot of the technologies were different from the front end, her previous experiences provided some common ground to start from.

I mean, if you're on your company website and you're searching something, any search you're doing is a database search basically. In order to have a full stack website, you do need a database component. And also the way you think about how you're going to design something is also the same in terms of just general principles if you want this to be logical to users. So you want your database to be logical to your developers and the people building the backend, and then you want your front end to be inviting and intuitive as well. So I think the main design principles are the same in that respect.

So despite the overlap, changing career tracks can still be scary and feel like a risky gamble. Jo has some pretty great words of advice for people considering making that kind of change.

I guess, overall, I did take a step back to sort of figure out what I wanted to do in terms of my career. So in the words of the famous philosopher, Nipsey Hussle, sometimes you have to take two steps back to go 10 steps forward. I encourage that bit of self-reflection and thought for something as important as you know, your career and what you're going to spend at least eight hours a day doing.

Wow.

That's a great quote, right?

It is. That's the first one on here. We haven't had any Nipsey quotes, so I appreciate that.

For now at least, she thinks that those two steps back to go 10 steps forward has been a really good choice for her.

I call SQL my happy place. It never feels sad when I'm working with that language.

Aw, that is so awesome. And I feel really honored that I got to kind of see this transition from social media and just watching her move away and embrace this new skill and that she has found her happy place. I want that for everybody. And I'm so glad she found it.

So Johan, Angela, databases are from what I gather this incredibly vital component to the stack. And it also interacts with the entire stack, right?

Oh, definitely. You've heard it in the episode where the application writer has to have that logic in mind when even building the application. And the database architect has to be so in tune to what that information has to look like for the level of efficiency you're going to need to query said data. And the database administrator, well, they have to do their best to make sure that the resources are available, that the database stays finally tuned, that it stays up to date. There is a lot of interaction with the front end, with the back end, with the operating system, it touches everything and everything touches it.

Johan, what about you? So after talking with Databae Jo and Louis, what are you taking away from all of this?

Well, I knew going in that databases were complex and vital and that they kind of with all aspects of the stack. What I didn't realize was how much more they're used for than what I initially thought for. The whole idea of stored procedures, the way indexing works, just how complex all of it is just kind of blew my mind. And these databases as these applications get really big, they require more and more specialized roles just for the database themselves.

Yeah.

Yep.

I was thinking about that too, when we were talking about the different roles within the database layer. That seems like a very specialized set of knowledge.

It is, it is. And you'd be surprised at some organizations, one person is wearing all of those hats.

Wow.

So it is a very specialized, but a very, very important role in application development.

Because databases interact with so many different layers of the stack, it's very helpful for all the other people who are involved with the database to know a little bit about how queries work and how to retrieve information and what kind of performance that they're going to get out of it.

Totally agree.

It would be really helpful for everyone to learn at least a little bit about databases. So databases are something that everyone should know about, and Louis was telling me about how the database sits directly on top of the operating system. And that's going to be our next stop. So up next, we're going to learn about operating systems.

Which used to be my love language.

Oh yeah?

Yeah. I was a systems administrator, so I just dealt with managing different operating systems. So I can't wait. I feel a kinship to every episode that we've done and we're just going to continue it on with operating systems.

What kind of kinship, was it a relational kinship or a nonrelational kinship?

So that was the database episode. You must reach out to us and share your thoughts. Add to our database of thoughts. Would you please hit us up at red hat using the hashtag Compiler Podcast? Even on Instagram. Wherever you are, we want to hear what you have to say about databases. Just tell us what you think. We'd love to hear it. And that does it for the database episode of Compiler Stack Unstuck.

Today's episode was produced by Johan Philippine and Caroline Craighead. Victoria Lawton always knows where to find us.

Our audio engineer is Kristie Chan. Special thanks to Shawn Cole. Our theme song was composed by Mary Ancheta.

Big thank you to our guests: Jo Josephs and Louis Imershein.

Our audio team includes Leigh Day, Laura Barnes, Stephanie [inaudible 00:28:28], Mike [inaudible 00:28:29], Nick Burns, Aaron Williamson, Karen King, Boo Boo Howse, Rachel Ertel, Mike Compton, Ocean Matthews, Alex [inaudible 00:28:40], and Laura Walters

If you like today's episode, please follow the show. Rate us, leave us a review, and even share it with someone. It really does help us out.

Thanks so much for listening. See you soon, everybody.

All right, goodbye.

Sobre el podcast

Compiler

Do you want to stay on top of tech, but find you’re short on time? Compiler presents perspectives, topics, and insights from the industry—free from jargon and judgment. We want to discover where technology is headed beyond the headlines, and create a place for new IT professionals to learn, grow, and thrive. If you are enjoying the show, let us know, and use #CompilerPodcast to share our episodes.

Stack/Unstuck: Data-baeses

Data-baeses | Compiler: Stack/Unstuck

About the episode

Suscribir

Transcripción

More about análisis de datos

Sobre el podcast

Compiler

Plataformas

Herramientas

Versiones de prueba, compras y ventas

Canales de comunicación

Acerca de Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links

Stack/Unstuck: Data-baeses

Data-baeses | Compiler: Stack/Unstuck

About the episode

Suscribir

Transcripción

More about análisis de datos

Presentación de OpenShift Service Mesh 3.2 con el modo ambient de Istio

Prepara a tu equipo. Reduce riesgos. Desarrolla habilidades escalables.

Command Line Heroes: Cuarta temporada: Una cosa más con Steve Wozniak

Open Source Hardware | Command Line Heroes

Sobre el podcast

Compiler

Plataformas

Herramientas

Versiones de prueba, compras y ventas

Canales de comunicación

Acerca de Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links