Episode 17
Who’s Afraid Of Compilers?
Show Notes
It’s about time we asked a question about compilers. It’s been a scary proposition. Compilers have a reputation for density, complexity, and a fair bit of mysticism. But when we looked into them, we learned they’re really just like any other program. So we wondered: Who’s afraid of compilers?
In this episode, we start to break down the reputation by opening up the black box. What do compilers do? How do they work? And what can you gain by learning more about the inner workings of compilers?
Transcript
00:02 - Johan Philippine
Angela, Brent.
00:04 - Brent Simoneaux
Johan.
00:07 - Johan Philippine
We work on a tech podcast.
00:08 - Angela Andrews
We do?
00:09 - Brent Simoneaux
We do.
00:10 - Johan Philippine
It's called Compiler.
00:12 - Angela Andrews
It is?
00:12 - Brent Simoneaux
Yes.
00:13 - Johan Philippine
One of the many things we haven't talked about on this show yet is little C compilers.
00:20 - Angela Andrews
It was-
00:20 - Brent Simoneaux
Oh, boy.
00:20 - Angela Andrews
... only a matter of time.
00:21 - Johan Philippine
Right?
00:22 - Brent Simoneaux
This is going to get really confusing.
00:24 - Angela Andrews
Well, that's what we do.
00:25 - Johan Philippine
We'll try and straighten it out.
00:27 - Angela Andrews
Exactly. We try to pull back the covers a little bit.
00:30 - Johan Philippine
And that's something that maybe compilers need a little bit. They have a certain reputation in ...
00:36 - Angela Andrews
They have a PR problem.
00:39 - Johan Philippine
Yeah. A little bit.
00:39 - Brent Simoneaux
Oh. Wait, say more about that. What do you mean Angela?
00:43 - Angela Andrews
I mean, when you say the word compiler, it sounded like you were uncertain and there is a little bit of angst there. And I think a lot of people, when they think about language, compilers, programming language compilers, it sounds like a difficult and mystical and unknown thing. What's happening when compilers do their thing? Most people don't know and maybe that not knowing is a little bit weird, a little bit scary.
01:11 - Brent Simoneaux
Yeah.
01:12 - Johan Philippine
Well, I spoke with some compiler engineers and believe it or not, there are also amateur compiler builders out there who do it for a hobby.
01:23 - Brent Simoneaux
Yeah.
01:23 - Angela Andrews
Okay.
01:24 - Johan Philippine
And they assured me that it's not… maybe not as scary as their reputation make them out to be. So I started to ask, "Who's afraid of compilers?"
01:37 - Brent Simoneaux
This is Compiler, an original podcast from Red Hat.
01:42 - Angela Andrews
We're your hosts.
01:43 - Brent Simoneaux
I'm Brent Simoneaux.
01:45 - Angela Andrews
And I'm Angela Andrews.
01:46 - Brent Simoneaux
We're here to break down questions from the tech industry; big, small, and sometimes strange.
01:54 - Angela Andrews
Each episode, we go out in search of answers from Red Hatters and people they’re connected to.
02:00 - Brent Simoneaux
Today's question: who's afraid of compilers?
02:08 - Angela Andrews
Producer Johan Philippine is here to translate.
02:12 - Johan Philippine
I figured to start us off, it would be a good idea to go over the basics. So I spoke with Thorsten Ball and he gave me a fantastic metaphor for what compilers and their interpreter cousins actually do.
02:28 - Thorsten Ball
Imagine if you are talking to a friend, you speak English, your friend doesn't speak English, speaks only Spanish. You have a third friend who speaks both languages. An interpreter would be, you say something, your friend listens to you and says it in Spanish. And the compiler would be your friend listening to you say something, sitting down, writing it down, translating it into Spanish, and then handing that document to your friend.
02:57 - Brent Simoneaux
So he's making a distinction here between interpreter and compilers, right?
03:03 - Johan Philippine
That's right. So a compiler, well, let's start with the interpreter because that's what he starts with. You have languages that are compiled, you have languages that are interpreted. The thing that they both have in common is that they take source code, which is human readable but if you were just to try to run that code on the computer, somehow the computer wouldn't know what to do with it.
03:26 - Angela Andrews
It does not compute.
03:27 - Johan Philippine
Exactly, it does not. It would just tell you ...
03:28 - Brent Simoneaux
Literally-
03:30 - Johan Philippine
... it does not compute. Yeah, literally it wouldn't know what to do with that. So you need a step to go from that high level language where you're writing out these instructions for a program and you need a step to translate that into something that a computer is actually going to understand. Now an interpreter will just take the source code and then line by line as you're writing it and running it'll just directly translate it into machine level code, which is-
03:59 - Brent Simoneaux
Gotcha.
03:59 - Johan Philippine
And then that's it. Right? It's like someone doing like in Thorsten's example where it's pretty much a simultaneous translation.
04:07 - Angela Andrews
Yeah. So just so I'm tracking, the interpreter, it takes it one, I don't know, one sentence or conversation at a time.
04:16 - Johan Philippine
Exactly.
04:16 - Angela Andrews
And the compiler has to go through the entire thing to ...
04:21 - Brent Simoneaux
Oh-
04:21 - Johan Philippine
That's right-
04:23 - Angela Andrews
... translate into what the machine could actually understand.
04:28 - Brent Simoneaux
Got it.
04:28 - Johan Philippine
Right. There's one more thing that's different about compilers as well is that just like you said, yes they go through the entire program and translate it all at once. And then in Thorsten's example, there is that last thing about handing a document to your friend. Right? So a compiler has a file output essentially, right? That then stays the same and isn't changeable. Whereas an interpreter just runs the code as you're writing it and running the ... There's no document output from an interpreter typically.
05:00 - Brent Simoneaux
Oh.
05:01 - Johan Philippine
So that's the basics of what a compiler does. There are many different types of compilers. Today, we're going to focus on the ones that translate from high level source code to low level machine code that computers can understand. Now we got the high-level explanation from Thorsten. We'll come back to Thorsten later. He's got a lot more insights for us. But to dive a little deeper into the actual steps that compilers go through, I spoke to Josh Stone and he works on the compiler for Rust. He's going to help us understand the different steps that the Rust compiler goes through.
05:44 - Josh Stone
Right. So usually you start with a parsing phase and that just reads in the textual code and turns it into an internal data structure, usually called an abstract syntax tree. And then in something like Rust, or I think many compilers have this, there's also an internal representation where you take that syntax tree, which looks pretty similar to what the code was and transform it into some kind of internal representation.
06:15 - Johan Philippine
All right. So quick compiler Johan break here. I'm going to translate that as best as I can. This first step he's talking about, it takes the source code that the developer has written and breaks it up into component pieces for the next step.
06:29 - Josh Stone
So MIR is often used as a middle intermediate representation. And then that phase is where you can analyze the code and do things like type inference where you determine the types of all the variables. Rust has a borrow checker, which is where it does the analysis to make sure that references to values don't outlive the values themselves and also that exclusive borrows truly are exclusive. So you'd prevent concurrent modification. So those sort of analysis happen at this middle phase. And then from there Rust turns that MIR into LLVM IR. So LLVM is the low level virtual machine. It's a library that we use for optimization and code generation.
07:20 - Johan Philippine
Quick pause again here. This middle section he's talking about is when the compiler takes that broken down code from the first section and starts testing it to make sure that the program's logic works, essentially that all the Is are dotted and the Ts are crossed and makes it ready for the next step.
07:42 - Josh Stone
So the Rust compiler is just translating from its own representation into LLVM's representation and then hands off to LLVM to optimize the code so it will change things like maybe if you have X plus one plus one, it can change that to a X plus two, right? That's a very simple modification, but those things compound. So a lot of optimizations change the code into something more efficient.
08:08 - Johan Philippine
So there are a lot of other optimizations that compilers can do and oftentimes these compilers will give the developers a choice on which to turn on or to turn off, but using them is a trade-off. It ends up with faster code, but it takes a longer time to actually compile it. Once all of that is done, we get to the final steps.
08:35 - Angela Andrews
Wait, there's more!
08:37 - Josh Stone
And then it finally goes through a code generation phase where it takes its own target neutral representation and turns it into specific X86 instructions or ARM instructions, whatever the target is. And then those instructions will be written to an object file and you link that into a program and then that's the thing that you can actually run.
08:59 - Johan Philippine
All right. Mission accomplished.
09:03 - Angela Andrews
You did it.
09:03 - Brent Simoneaux
Yay.
09:03 - Johan Philippine
We have our executable program, which has been translated for a specific kind of chip, right? Either the X86 or the ARM like Josh mentions or whatever it is that the compiler has been designed to translate for and you have your program.
09:22 - Brent Simoneaux
Okay. Speaking of being human readable, I'm having a little trouble understanding. Angela, Johan, can you maybe just recap this for me real quick?
09:35 - Johan Philippine
Sure. Angela, do you want to take a shot at it?
09:38 - Angela Andrews
I'll take a quick shot. So again, we're starting with the source code and it breaks it down into manageable bytes, right? It looks at the variables and it decides what's the best type for what's being presented in the source code, right? And then the compiler tests to make sure that the logic is working, it's tracking. Is everything tracking here? And then it goes to the next level, which is the actual, where LLVM says, "Okay, I'm looking at your code, let's optimize it. Let's supercharge. Let's make sure this is the most efficient use as possible."
10:18 - Angela Andrews
And then if you do a lot of that optimization, your compile time can be much longer because it's doing much more work in the beginning. Or if you want it to compile quickly, right? And then you're really not concerned about that optimization as much, you just wanted to get it done, that's where you… at the end it turns it into this instruction that the processor can actually understand. And he mentioned the X86, he mentioned the ARM and it gives you this file that will work for you because it has gone through all of these different levels. And this is what you can actually "run". So many moving parts here, but I think I was keeping up.
11:07 - Brent Simoneaux
Yeah.
11:10 - Johan Philippine
So we just talked about the primary function of compiler, which is translation, right? And while the task sounds pretty straightforward, once we dig into the specifics, it's a pretty involved process. Josh assured me that once you dig into it's not as difficult as it sounds, but we also heard a few hints about some of the other tasks a compiler can perform in that middle section. To learn more about that, I spoke with David Malcolm. He is a principal software engineer here at Red Hat and he works on the GCC compiler's warning and error messages.
11:44 - Brent Simoneaux
Oh boy.
11:47 - David Malcolm
The diagnostics and the warnings are a secondary goal of the compiler, but it's no point in generating highly optimal efficient code that does the wrong thing. I mean, I can write code that does the wrong thing and make it run extremely fast.
12:03 - Johan Philippine
What he's saying here is that he works on the warning and error messages, right? And they play a pretty vital role, but they help with making the code run and making it run more quickly. Right. That's what the warnings are there for to help you edit and change your code a little bit so that the compiler can deal with it in a way that's more efficient.
12:30 - Angela Andrews
And warnings are super helpful because who wants to read a horrible, unclear error message that doesn't tell you anything. You want it to be clear and helpful, so you can go back and make code that makes sense and actually works. So great error messaging is crucial.
12:51 - Johan Philippine
Yeah.
12:52 - Angela Andrews
Shout out to David.
12:56 - Johan Philippine
His point though, in saying that was that the errors and the warnings that come up, they help write more efficient code and eliminate bugs. But again, that's a secondary goal of the compiler, right? The primary goal is making sure that when translating the code, nothing gets lost in translation. So it's especially important to David that the warnings and errors he develops are relevant to the code people are writing. Right. Because what they really want is for their code to run. And he wants to help them get there without getting in their way.
13:31 - David Malcolm
What I do when I implement a new warning is, well, I said this before, I try and run the compiler against lots and lots of different projects written by different people. And with the warning turned on and see what falls out. And I was going to say, hopefully we get lots of warnings, but I don't know if I hope for that or not now I think about it. That's rather malicious in a way. Hopefully all the code is perfect in every way and there are no problems to be solved. But given reality, I mean, the reason for doing it is presumably there are mistakes that people are making, and my dream is to be able to eliminate entire categories of bug by warning about them.
14:24 - Brent Simoneaux
How often is the code perfect Angela?
14:27 - Angela Andrews
There is no perfect code. ‘Cause humans, I mean ...
14:33 - Brent Simoneaux
Yeah.
14:33 - Johan Philippine
Yeah. I would suspect that warnings and errors come up all the time.
14:39 - Angela Andrews
And they're really important.
14:41 - Brent Simoneaux
But I love this attitude that David has towards this. He's like, "My dream is to be able to eliminate entire categories of bugs." By warning about them and being really helpful.
14:55 - Angela Andrews
Right.
14:55 - Johan Philippine
Yeah. And we spoke about the different kinds of categories and also the frequency of the bugs that the compiler's going to throw at you. Right. Because depending on what options you have turned on in the compiler, you can have it try and detect a lot of things and make your program just that much more efficient and better running. Or you can have it really just detect the things that are crucial and that are not going to let your program run. And so there's a bit of a balancing act that he told me about where you have to try and think about which bugs you really want to have come front and center in front of the developer so that they actually pay attention to them, and not spam the developer with a bunch of other warnings and bugs that would be helpful for efficiency, but aren't absolutely crucial to making a program actually run. Right?
15:52 - Angela Andrews
You have choices.
15:53 - Johan Philippine
You have choices that affect how quickly the compiler runs and if you flood the developer with too many messages, they might stop reading them entirely and miss the ones that are helpful in terms of making the program run. So David's got this great attitude about how his role in building the compiler is maybe a secondary one, right? Where it's helpful and a lot of people use these warnings and errors to write better code, but he doesn't really mince words at all on the importance of compilers in general.
16:31 - David Malcolm
The Linux kernel is compiled using GCC. So without the compiler there's nothing. And so I guess the kernel folks would like to say, they're the most important but I like to think we're the most important and… pistols at dawn or something. But yeah, without the compiler, your source code is just source code. You can't actually run it. And historically GCC has been the system compiler and everything was built with it.
16:58 - Brent Simoneaux
David is throwing down.
17:00 - Angela Andrews
He has a way with words, doesn't he?
17:03 - Johan Philippine
Yeah. He's got some fighting words and he is ready to back them up.
17:06 - Brent Simoneaux
I love it.
17:07 - Angela Andrews
But he has a point though.
17:09 - Johan Philippine
Yes. He has a point. Without compilers, you don't have programming, you don't have a lot of things. On the other hand, without a kernel you also ...
17:19 - Brent Simoneaux
Yeah.
17:20 - Johan Philippine
So I don't know about which one's more important. I'm not going to judge that but ...
17:25 - Angela Andrews
Yeah, don't get into that duel.
17:26 - Johan Philippine
Perhaps they're both equally important? Question mark? I don't know.
17:32 - Angela Andrews
Let's leave it there.
17:32 - Johan Philippine
Let's leave it to them to decide between themselves and we can comment about it later. So we've talked about the primary function of a compiler, and we've talked a little bit about the internal workings and some of the things that happen in the middle of the compiler with optimizations and warning codes and things like that.
17:56 - Johan Philippine
I want to move on next to what you can gain from learning how to build your own compiler, which it turns out isn't necessarily as difficult as it may seem. I spoke with Thorsten Ball. We heard from him a little bit at the top of the show. He's a self-taught programmer who decided it would be fun to learn about compilers, which he had heard was allegedly, one of the most difficult aspects of computer science.
18:22 - Angela Andrews
Can I ask what was his background before he decided to start toying around with compilers?
18:28 - Johan Philippine
He was a self-described philosophy student dropout. He grew up playing around with websites, building his own websites and things like that. A few years ago, he started toying around with programming, ended up becoming a programmer, self-taught, and started tinkering with compilers as a side project while he worked as a developer. As he started learning about compilers, he found it to be less daunting than their reputation, but he had a lot of trouble finding modern and non-dense resources, to put it diplomatically. He found it difficult to learn about compilers in a way that wasn't a very thick, math, heavy textbook, and to find instructions that were actually both understandable, approachable, but also thorough enough to go from zero to a working compiler with code in front of you. So ...
19:30 - Angela Andrews
Approachable, but thorough. Okay.
19:32 - Johan Philippine
Yeah, exactly. He ended up writing a couple books. The first book was about his experience, not about his experience, but it ended up being a resource for how to build an interpreter, which he wrote by taking notes about his own experiences, learning how to do it and he wanted to provide that resource for people who wanted to learn about compilers, but didn't want to dig into a doorstop of a textbook.
19:57 - Johan Philippine
And then he followed up that book with the book about compilers. So his end goal became to show that compilers aren't in their basic form anyways, as complex as their reputation. And he told me about what he learned beyond just how compilers work, what he learned about computing.
20:17 - Thorsten Ball
If you write a compiler that, say, translates a high level language like JavaScript or something like that, a really high level language where you don't have to worry about memory management, stuff like this, into X86 assembly code, you learn how a computer works and how most of the software that runs on your computer works. You find out what goes into an executable file, what is actually in there. Which data goes into it and which data does not go into it. You will learn what goes into memory, when you start your executable, how your operating system starts it and where it puts certain stuff and how it organizes things into memory. You learn how the computer actually executes things because you will get in touch with assembly instructions or machine instructions. That means, for example, spoiler, a computer doesn't have ‘if’ conditionals, right? So if/else in assembly language, there might be in some assemblers, but what you usually do is you compare two values or you test some values. And then based on that result, you do a jump or a go-to and jump to a different location of source code.
21:33 - Johan Philippine
Those kinds of insights might help you write better code, right? Because if you know what's going on under the hood, so to speak, you can write your code to be better tailored to those lower level instructions. So this is something that Josh actually told me a bit about. That compilers will run more efficiently, if you write your source code in a way that it can process more easily, you're less likely to get errors if you design your code to the compiler's processes.
22:03 - Thorsten Ball
So you realize how your high level language is translatable down to a lower level language without losing any meaning. So you will see how you can use lower level constructs to do the same thing that you can do in higher level constructs, right? So concrete example in JavaScript or saying go or any language that has some higher level abstractions, you have something like a switch statement, right? A computer doesn't have a switch statement, a computer doesn't have switch instructions.
22:35 - Thorsten Ball
It's made up of these tests and if this is true, then jump to here, test, if this is true, then jump to here, blah, blah, blah. And in fact, most compilers might optimize some of that away because they already know that it can be true or it can be false or something like this.
22:51 - Johan Philippine
In the end, most languages are built using those same blocks, right? Because the computer itself doesn't change, but all these different languages that are written and implemented in different ways end up having to use those same blocks to function.
23:08 - Thorsten Ball
Then at the end, of course, when you have built your compiler or while you're building it, you will suddenly understand how 80% of all the other programming languages work. And you will maybe gain this little bit of appreciation or understanding for how all of the software right now that we are using, or that we're building every day is built on this tower of abstractions that we pulled ourselves up on, where the languages we're using today, under the hood, use languages that were built 10 years ago that those use languages that were built 20 years ago and stuff like this. And it all in the end comes down to machine instructions. But we are so high on this tower of abstractions that we barely ever notice. And that's quite fascinating.
24:02 - Brent Simoneaux
What does he mean by that Angela? The tower of abstractions climbing? What does that mean?
24:09 - Angela Andrews
So understanding that programming has been around for a long time. I'm talking '50s. Maybe around the '50s when the first programming languages, could be even earlier, but I'm thinking of specific languages when I say around the '50s. There's a language and then a language comes along after it, built on top of that language, right? And then that gets to be used and it gets popular and it waxes and wanes. And then another programming language may be based off of that one. And again, that's where the abstraction starts to happen and unfold where he says, "There were languages built 10 years ago, but they're based on languages that were built 20 years before." You know what I mean?
24:55 - Brent Simoneaux
Yeah.
24:55 - Angela Andrews
So that it becomes much more abstracted, the higher you go. So we're so high on that tower as he puts it. Yeah.
25:05 - Brent Simoneaux
Anything abstracted from the metal or like from the ...
25:09 - Angela Andrews
Yes, we're so abstracted from the ones-
25:13 - Brent Simoneaux
The hardware.
25:14 - Angela Andrews
... and zeros of-
25:14 - Johan Philippine
Yes, exactly.
25:15 - Angela Andrews
... what the machine actually understands. Learning these compilers gives you so much insight as to how all the languages tend to work and how they tend to do exactly what they were built to do.
25:31 - Johan Philippine
Yeah. Pun not intended here, but to build on that a little bit more, let's think a little bit about what the end results of these increasing abstractions looks like. Right. We start with these shared building blocks that a lot of these languages have in common. We build these languages using those blocks, that introduce new features and new functions, and they diverge a bit, right? They're different from each other. And then new languages are built on top of those and those divergences get wider. And every time you do that, you get that much farther from those original building blocks that the machine can understand.
26:12 - Brent Simoneaux
Johan, you have been studying compilers. You have been in the books, you've been talking to people. What should we do with all of this information? Should we all go try to make a compiler? What do we do with all this?
26:30 - Johan Philippine
I don't know that everyone should do it, right? But if it's something that you're interested in, but you were afraid to tackle it because you've heard that you need to know a lot of advanced mathematics or that the code can be really dense, all of those things, they aren't necessarily true, right? So Thorsten wrote a couple books, one about how to write an interpreter and another that follows on writing a compiler. That interpreter is only about 3000 lines long and the compiler adds on a few thousand more lines, which might sound like a lot to a person who's never programmed before. But when you compare it to something like the Rust compiler that Josh Stone works on, that one is about half a million lines of code long.
27:16 - Angela Andrews
So in this podcast, it sounds as if what you assumed about compilers is that they are hard and unapproachable, you might want to reconsider because you too may be able to write your own compiler. If you take it apart piece by piece, work small and then build upon that.
27:38 - Johan Philippine
Yeah. And hopefully that helps you understand how the programming languages you use are put together. How they run and how they turn your source code into something that a machine can actually run.
27:53 - Angela Andrews
Exactly. And understanding where those error messages are coming from, making it make sense, that is so important.
28:02 - Johan Philippine
Yeah. When you know how the compiler works and where it breaks down, that helps you understand a little bit better where the error messages are coming from.
28:12 - Angela Andrews
Exactly. And that's how you can write much better code. Hey, let us know if we've unmasked the wizard behind the compiler curtain. Does this compute? We want to know. Share your thoughts with us. You can tweet us @RedHat on Twitter, or just use the hashtag #CompilerPodcast. We would really love to hear what you thought about the show. And that does it for the compiler episode of Compiler.
28:41 - Brent Simoneaux
Today's episode was produced by Johan Philippine and Caroline Creaghead. Victoria Lawton makes sure this show passes all the compiler checks.
28:51 - Angela Andrews
She sure does. Our audio engineer is Christian Proham. Special thanks to Shawn Cole. Our theme song was composed by Mary Ancheta.
29:02 - Brent Simoneaux
Thank you to Josh Stone, David Malcolm, Thorsten Ball.
29:06 - Angela Andrews
Our audio team includes Leigh Day, Laura Barnes, Stephanie Wonderlick, Mike Esser, Claire Allison, Nick Burns, Aaron Williamson, Karen King, Boo Boo Howse, Rachel Ertel, Mike Compton, Ocean Matthews, and Laura Walters.
29:24 - Brent Simoneaux
If you liked today's show, please follow us. You can rate the show, leave us a review, share it with someone you know. It really does help.
29:33 - Angela Andrews
We would love to hear from you. And we're so glad you joined us for this episode. Take care everybody.
29:39 - Brent Simoneaux
See you next time.
Featured guests
Thorsten Ball
Josh Stone
David Malcolm