Who’s Afraid Of Compilers?

Who’s Afraid Of Compilers?

March 17, 2022 | Compiler Team Application development and delivery

Compiler • • Who’s Afraid Of Compilers? | Compiler

Who’s Afraid Of Compilers? | Compiler

About the episode

It’s about time we asked a question about compilers. It’s been a scary proposition. Compilers have a reputation for density, complexity, and a fair bit of mysticism. But when we looked into them, we learned they’re really just like any other program. So we wondered: Who’s afraid of compilers?

In this episode, we start to break down the reputation by opening up the black box. What do compilers do? How do they work? And what can you gain by learning more about the inner workings of compilers?

Compiler team Red Hat original show

Subscribe here:

Transcript

Angela, Brent.

Johan.

We work on a tech podcast.

We do?

We do.

It's called Compiler.

It is?

Yes.

One of the many things we haven't talked about on this show yet is little C compilers.

It was-

Oh, boy.

... only a matter of time.

Right?

This is going to get really confusing.

Well, that's what we do.

We'll try and straighten it out.

Exactly. We try to pull back the covers a little bit.

And that's something that maybe compilers need a little bit. They have a certain reputation in ...

They have a PR problem.

Yeah. A little bit.

Oh. Wait, say more about that. What do you mean Angela?

I mean, when you say the word compiler, it sounded like you were uncertain and there is a little bit of angst there. And I think a lot of people, when they think about language, compilers, programming language compilers, it sounds like a difficult and mystical and unknown thing. What's happening when compilers do their thing? Most people don't know and maybe that not knowing is a little bit weird, a little bit scary.

Yeah.

Well, I spoke with some compiler engineers and believe it or not, there are also amateur compiler builders out there who do it for a hobby.

Yeah.

Okay.

And they assured me that it's not… maybe not as scary as their reputation make them out to be. So I started to ask, "Who's afraid of compilers?"

This is Compiler, an original podcast from Red Hat.

We're your hosts.

I'm Brent Simoneaux.

And I'm Angela Andrews.

We're here to break down questions from the tech industry; big, small, and sometimes strange.

Each episode, we go out in search of answers from Red Hatters and people they’re connected to.

Today's question: who's afraid of compilers?

Producer Johan Philippine is here to translate.

I figured to start us off, it would be a good idea to go over the basics. So I spoke with Thorsten Ball and he gave me a fantastic metaphor for what compilers and their interpreter cousins actually do.

Imagine if you are talking to a friend, you speak English, your friend doesn't speak English, speaks only Spanish. You have a third friend who speaks both languages. An interpreter would be, you say something, your friend listens to you and says it in Spanish. And the compiler would be your friend listening to you say something, sitting down, writing it down, translating it into Spanish, and then handing that document to your friend.

So he's making a distinction here between interpreter and compilers, right?

That's right. So a compiler, well, let's start with the interpreter because that's what he starts with. You have languages that are compiled, you have languages that are interpreted. The thing that they both have in common is that they take source code, which is human readable but if you were just to try to run that code on the computer, somehow the computer wouldn't know what to do with it.

It does not compute.

Exactly, it does not. It would just tell you ...

Literally-

... it does not compute. Yeah, literally it wouldn't know what to do with that. So you need a step to go from that high level language where you're writing out these instructions for a program and you need a step to translate that into something that a computer is actually going to understand. Now an interpreter will just take the source code and then line by line as you're writing it and running it'll just directly translate it into machine level code, which is-

Gotcha.

And then that's it. Right? It's like someone doing like in Thorsten's example where it's pretty much a simultaneous translation.

Yeah. So just so I'm tracking, the interpreter, it takes it one, I don't know, one sentence or conversation at a time.

Exactly.

And the compiler has to go through the entire thing to ...

Oh-

That's right-

... translate into what the machine could actually understand.

Got it.

Right. There's one more thing that's different about compilers as well is that just like you said, yes they go through the entire program and translate it all at once. And then in Thorsten's example, there is that last thing about handing a document to your friend. Right? So a compiler has a file output essentially, right? That then stays the same and isn't changeable. Whereas an interpreter just runs the code as you're writing it and running the ... There's no document output from an interpreter typically.

Oh.

So that's the basics of what a compiler does. There are many different types of compilers. Today, we're going to focus on the ones that translate from high level source code to low level machine code that computers can understand. Now we got the high-level explanation from Thorsten. We'll come back to Thorsten later. He's got a lot more insights for us. But to dive a little deeper into the actual steps that compilers go through, I spoke to Josh Stone and he works on the compiler for Rust. He's going to help us understand the different steps that the Rust compiler goes through.

Right. So usually you start with a parsing phase and that just reads in the textual code and turns it into an internal data structure, usually called an abstract syntax tree. And then in something like Rust, or I think many compilers have this, there's also an internal representation where you take that syntax tree, which looks pretty similar to what the code was and transform it into some kind of internal representation.

All right. So quick compiler Johan break here. I'm going to translate that as best as I can. This first step he's talking about, it takes the source code that the developer has written and breaks it up into component pieces for the next step.

So MIR is often used as a middle intermediate representation. And then that phase is where you can analyze the code and do things like type inference where you determine the types of all the variables. Rust has a borrow checker, which is where it does the analysis to make sure that references to values don't outlive the values themselves and also that exclusive borrows truly are exclusive. So you'd prevent concurrent modification. So those sort of analysis happen at this middle phase. And then from there Rust turns that MIR into LLVM IR. So LLVM is the low level virtual machine. It's a library that we use for optimization and code generation.

Quick pause again here. This middle section he's talking about is when the compiler takes that broken down code from the first section and starts testing it to make sure that the program's logic works, essentially that all the Is are dotted and the Ts are crossed and makes it ready for the next step.

So the Rust compiler is just translating from its own representation into LLVM's representation and then hands off to LLVM to optimize the code so it will change things like maybe if you have X plus one plus one, it can change that to a X plus two, right? That's a very simple modification, but those things compound. So a lot of optimizations change the code into something more efficient.

So there are a lot of other optimizations that compilers can do and oftentimes these compilers will give the developers a choice on which to turn on or to turn off, but using them is a trade-off. It ends up with faster code, but it takes a longer time to actually compile it. Once all of that is done, we get to the final steps.

Wait, there's more!

And then it finally goes through a code generation phase where it takes its own target neutral representation and turns it into specific X86 instructions or ARM instructions, whatever the target is. And then those instructions will be written to an object file and you link that into a program and then that's the thing that you can actually run.

All right. Mission accomplished.

You did it.

Yay.

We have our executable program, which has been translated for a specific kind of chip, right? Either the X86 or the ARM like Josh mentions or whatever it is that the compiler has been designed to translate for and you have your program.

Okay. Speaking of being human readable, I'm having a little trouble understanding. Angela, Johan, can you maybe just recap this for me real quick?

Sure. Angela, do you want to take a shot at it?

I'll take a quick shot. So again, we're starting with the source code and it breaks it down into manageable bytes, right? It looks at the variables and it decides what's the best type for what's being presented in the source code, right? And then the compiler tests to make sure that the logic is working, it's tracking. Is everything tracking here? And then it goes to the next level, which is the actual, where LLVM says, "Okay, I'm looking at your code, let's optimize it. Let's supercharge. Let's make sure this is the most efficient use as possible."

And then if you do a lot of that optimization, your compile time can be much longer because it's doing much more work in the beginning. Or if you want it to compile quickly, right? And then you're really not concerned about that optimization as much, you just wanted to get it done, that's where you… at the end it turns it into this instruction that the processor can actually understand. And he mentioned the X86, he mentioned the ARM and it gives you this file that will work for you because it has gone through all of these different levels. And this is what you can actually "run". So many moving parts here, but I think I was keeping up.

Yeah.

So we just talked about the primary function of compiler, which is translation, right? And while the task sounds pretty straightforward, once we dig into the specifics, it's a pretty involved process. Josh assured me that once you dig into it's not as difficult as it sounds, but we also heard a few hints about some of the other tasks a compiler can perform in that middle section. To learn more about that, I spoke with David Malcolm. He is a principal software engineer here at Red Hat and he works on the GCC compiler's warning and error messages.

Oh boy.

The diagnostics and the warnings are a secondary goal of the compiler, but it's no point in generating highly optimal efficient code that does the wrong thing. I mean, I can write code that does the wrong thing and make it run extremely fast.

What he's saying here is that he works on the warning and error messages, right? And they play a pretty vital role, but they help with making the code run and making it run more quickly. Right. That's what the warnings are there for to help you edit and change your code a little bit so that the compiler can deal with it in a way that's more efficient.

And warnings are super helpful because who wants to read a horrible, unclear error message that doesn't tell you anything. You want it to be clear and helpful, so you can go back and make code that makes sense and actually works. So great error messaging is crucial.

Yeah.

Shout out to David.

His point though, in saying that was that the errors and the warnings that come up, they help write more efficient code and eliminate bugs. But again, that's a secondary goal of the compiler, right? The primary goal is making sure that when translating the code, nothing gets lost in translation. So it's especially important to David that the warnings and errors he develops are relevant to the code people are writing. Right. Because what they really want is for their code to run. And he wants to help them get there without getting in their way.

What I do when I implement a new warning is, well, I said this before, I try and run the compiler against lots and lots of different projects written by different people. And with the warning turned on and see what falls out. And I was going to say, hopefully we get lots of warnings, but I don't know if I hope for that or not now I think about it. That's rather malicious in a way. Hopefully all the code is perfect in every way and there are no problems to be solved. But given reality, I mean, the reason for doing it is presumably there are mistakes that people are making, and my dream is to be able to eliminate entire categories of bug by warning about them.

How often is the code perfect Angela?

There is no perfect code. ‘Cause humans, I mean ...

Yeah.

Yeah. I would suspect that warnings and errors come up all the time.

And they're really important.

But I love this attitude that David has towards this. He's like, "My dream is to be able to eliminate entire categories of bugs." By warning about them and being really helpful.

Right.

Yeah. And we spoke about the different kinds of categories and also the frequency of the bugs that the compiler's going to throw at you. Right. Because depending on what options you have turned on in the compiler, you can have it try and detect a lot of things and make your program just that much more efficient and better running. Or you can have it really just detect the things that are crucial and that are not going to let your program run. And so there's a bit of a balancing act that he told me about where you have to try and think about which bugs you really want to have come front and center in front of the developer so that they actually pay attention to them, and not spam the developer with a bunch of other warnings and bugs that would be helpful for efficiency, but aren't absolutely crucial to making a program actually run. Right?

You have choices.

You have choices that affect how quickly the compiler runs and if you flood the developer with too many messages, they might stop reading them entirely and miss the ones that are helpful in terms of making the program run. So David's got this great attitude about how his role in building the compiler is maybe a secondary one, right? Where it's helpful and a lot of people use these warnings and errors to write better code, but he doesn't really mince words at all on the importance of compilers in general.

The Linux kernel is compiled using GCC. So without the compiler there's nothing. And so I guess the kernel folks would like to say, they're the most important but I like to think we're the most important and… pistols at dawn or something. But yeah, without the compiler, your source code is just source code. You can't actually run it. And historically GCC has been the system compiler and everything was built with it.

David is throwing down.

He has a way with words, doesn't he?

Yeah. He's got some fighting words and he is ready to back them up.

I love it.

But he has a point though.

Yes. He has a point. Without compilers, you don't have programming, you don't have a lot of things. On the other hand, without a kernel you also ...

Yeah.

So I don't know about which one's more important. I'm not going to judge that but ...

Yeah, don't get into that duel.

Perhaps they're both equally important? Question mark? I don't know.

Let's leave it there.

Let's leave it to them to decide between themselves and we can comment about it later. So we've talked about the primary function of a compiler, and we've talked a little bit about the internal workings and some of the things that happen in the middle of the compiler with optimizations and warning codes and things like that.

I want to move on next to what you can gain from learning how to build your own compiler, which it turns out isn't necessarily as difficult as it may seem. I spoke with Thorsten Ball. We heard from him a little bit at the top of the show. He's a self-taught programmer who decided it would be fun to learn about compilers, which he had heard was allegedly, one of the most difficult aspects of computer science.

Can I ask what was his background before he decided to start toying around with compilers?

He was a self-described philosophy student dropout. He grew up playing around with websites, building his own websites and things like that. A few years ago, he started toying around with programming, ended up becoming a programmer, self-taught, and started tinkering with compilers as a side project while he worked as a developer. As he started learning about compilers, he found it to be less daunting than their reputation, but he had a lot of trouble finding modern and non-dense resources, to put it diplomatically. He found it difficult to learn about compilers in a way that wasn't a very thick, math, heavy textbook, and to find instructions that were actually both understandable, approachable, but also thorough enough to go from zero to a working compiler with code in front of you. So ...

Approachable, but thorough. Okay.

Yeah, exactly. He ended up writing a couple books. The first book was about his experience, not about his experience, but it ended up being a resource for how to build an interpreter, which he wrote by taking notes about his own experiences, learning how to do it and he wanted to provide that resource for people who wanted to learn about compilers, but didn't want to dig into a doorstop of a textbook.

And then he followed up that book with the book about compilers. So his end goal became to show that compilers aren't in their basic form anyways, as complex as their reputation. And he told me about what he learned beyond just how compilers work, what he learned about computing.

If you write a compiler that, say, translates a high level language like JavaScript or something like that, a really high level language where you don't have to worry about memory management, stuff like this, into X86 assembly code, you learn how a computer works and how most of the software that runs on your computer works. You find out what goes into an executable file, what is actually in there. Which data goes into it and which data does not go into it. You will learn what goes into memory, when you start your executable, how your operating system starts it and where it puts certain stuff and how it organizes things into memory. You learn how the computer actually executes things because you will get in touch with assembly instructions or machine instructions. That means, for example, spoiler, a computer doesn't have ‘if’ conditionals, right? So if/else in assembly language, there might be in some assemblers, but what you usually do is you compare two values or you test some values. And then based on that result, you do a jump or a go-to and jump to a different location of source code.

Those kinds of insights might help you write better code, right? Because if you know what's going on under the hood, so to speak, you can write your code to be better tailored to those lower level instructions. So this is something that Josh actually told me a bit about. That compilers will run more efficiently, if you write your source code in a way that it can process more easily, you're less likely to get errors if you design your code to the compiler's processes.

So you realize how your high level language is translatable down to a lower level language without losing any meaning. So you will see how you can use lower level constructs to do the same thing that you can do in higher level constructs, right? So concrete example in JavaScript or saying go or any language that has some higher level abstractions, you have something like a switch statement, right? A computer doesn't have a switch statement, a computer doesn't have switch instructions.

It's made up of these tests and if this is true, then jump to here, test, if this is true, then jump to here, blah, blah, blah. And in fact, most compilers might optimize some of that away because they already know that it can be true or it can be false or something like this.

In the end, most languages are built using those same blocks, right? Because the computer itself doesn't change, but all these different languages that are written and implemented in different ways end up having to use those same blocks to function.

Then at the end, of course, when you have built your compiler or while you're building it, you will suddenly understand how 80% of all the other programming languages work. And you will maybe gain this little bit of appreciation or understanding for how all of the software right now that we are using, or that we're building every day is built on this tower of abstractions that we pulled ourselves up on, where the languages we're using today, under the hood, use languages that were built 10 years ago that those use languages that were built 20 years ago and stuff like this. And it all in the end comes down to machine instructions. But we are so high on this tower of abstractions that we barely ever notice. And that's quite fascinating.

What does he mean by that Angela? The tower of abstractions climbing? What does that mean?

So understanding that programming has been around for a long time. I'm talking '50s. Maybe around the '50s when the first programming languages, could be even earlier, but I'm thinking of specific languages when I say around the '50s. There's a language and then a language comes along after it, built on top of that language, right? And then that gets to be used and it gets popular and it waxes and wanes. And then another programming language may be based off of that one. And again, that's where the abstraction starts to happen and unfold where he says, "There were languages built 10 years ago, but they're based on languages that were built 20 years before." You know what I mean?

Yeah.

So that it becomes much more abstracted, the higher you go. So we're so high on that tower as he puts it. Yeah.

Anything abstracted from the metal or like from the ...

Yes, we're so abstracted from the ones-

The hardware.

... and zeros of-

Yes, exactly.

... what the machine actually understands. Learning these compilers gives you so much insight as to how all the languages tend to work and how they tend to do exactly what they were built to do.

Yeah. Pun not intended here, but to build on that a little bit more, let's think a little bit about what the end results of these increasing abstractions looks like. Right. We start with these shared building blocks that a lot of these languages have in common. We build these languages using those blocks, that introduce new features and new functions, and they diverge a bit, right? They're different from each other. And then new languages are built on top of those and those divergences get wider. And every time you do that, you get that much farther from those original building blocks that the machine can understand.

Johan, you have been studying compilers. You have been in the books, you've been talking to people. What should we do with all of this information? Should we all go try to make a compiler? What do we do with all this?

I don't know that everyone should do it, right? But if it's something that you're interested in, but you were afraid to tackle it because you've heard that you need to know a lot of advanced mathematics or that the code can be really dense, all of those things, they aren't necessarily true, right? So Thorsten wrote a couple books, one about how to write an interpreter and another that follows on writing a compiler. That interpreter is only about 3000 lines long and the compiler adds on a few thousand more lines, which might sound like a lot to a person who's never programmed before. But when you compare it to something like the Rust compiler that Josh Stone works on, that one is about half a million lines of code long.

So in this podcast, it sounds as if what you assumed about compilers is that they are hard and unapproachable, you might want to reconsider because you too may be able to write your own compiler. If you take it apart piece by piece, work small and then build upon that.

Yeah. And hopefully that helps you understand how the programming languages you use are put together. How they run and how they turn your source code into something that a machine can actually run.

Exactly. And understanding where those error messages are coming from, making it make sense, that is so important.

Yeah. When you know how the compiler works and where it breaks down, that helps you understand a little bit better where the error messages are coming from.

Exactly. And that's how you can write much better code. Hey, let us know if we've unmasked the wizard behind the compiler curtain. Does this compute? We want to know. Share your thoughts with us. You can tweet us @RedHat on Twitter, or just use the hashtag #CompilerPodcast. We would really love to hear what you thought about the show. And that does it for the compiler episode of Compiler.

Today's episode was produced by Johan Philippine and Caroline Creaghead. Victoria Lawton makes sure this show passes all the compiler checks.

She sure does. Our audio engineer is Christian Proham. Special thanks to Shawn Cole. Our theme song was composed by Mary Ancheta.

Thank you to Josh Stone, David Malcolm, Thorsten Ball.

Our audio team includes Leigh Day, Laura Barnes, Stephanie Wonderlick, Mike Esser, Claire Allison, Nick Burns, Aaron Williamson, Karen King, Boo Boo Howse, Rachel Ertel, Mike Compton, Ocean Matthews, and Laura Walters.

If you liked today's show, please follow us. You can rate the show, leave us a review, share it with someone you know. It really does help.

We would love to hear from you. And we're so glad you joined us for this episode. Take care everybody.

See you next time.

About the show

Compiler

Do you want to stay on top of tech, but find you’re short on time? Compiler presents perspectives, topics, and insights from the industry—free from jargon and judgment. We want to discover where technology is headed beyond the headlines, and create a place for new IT professionals to learn, grow, and thrive. If you are enjoying the show, let us know, and use #CompilerPodcast to share our episodes.

Who’s Afraid Of Compilers?

Who’s Afraid Of Compilers? | Compiler

About the episode

Subscribe

Transcript

More about application development and delivery

About the show

Compiler

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links

Who’s Afraid Of Compilers?

Who’s Afraid Of Compilers? | Compiler

About the episode

Subscribe

Transcript

More about application development and delivery

What is Red Hat OpenShift? A cooking story.

Top articles for developers from 2023

The Truth About Netcode | Compiler

The CTO And The Vision | Compiler: Re:Role

Testing, PDFs, And Donkeys | Compiler: Stack/Unstuck

About the show

Compiler

Platforms

Tools

Try, buy, & sell

Communicate

About Red Hat

Change page language

Red Hat legal and privacy links

Red Hat legal and privacy links