Subscribe
& more

Episode 17

Who’s Afraid Of Compilers?

Compiler

Show Notes

It’s about time we asked a question about compilers. It’s been a scary proposition. Compilers have a reputation for density, complexity, and a fair bit of mysticism. But when we looked into them, we learned they’re really just like any other program. So we wondered: Who’s afraid of compilers?

In this episode, we start to break down the reputation by opening up the black box. What do compilers do? How do they work? And what can you gain by learning more about the inner workings of compilers?

Transcript

00:02 - Johan Philippine 

Angela, Brent.

00:04 - Brent Simoneaux 

Johan.

00:07 - Johan Philippine 

We work on a tech podcast.

00:08 - Angela Andrews 

We do?

00:09 - Brent Simoneaux 

We do.

00:10 - Johan Philippine 

It's called Compiler.

00:12 - Angela Andrews 

It is?

00:12 - Brent Simoneaux 

Yes.

00:13 - Johan Philippine 

One of the many things we haven't talked about on this show yet is little C compilers.

00:20 - Angela Andrews 

It was-

00:20 - Brent Simoneaux 

Oh, boy.

00:20 - Angela Andrews 

... only a matter of time.

00:21 - Johan Philippine 

Right?

00:22 - Brent Simoneaux 

This is going to get really confusing.

00:24 - Angela Andrews 

Well, that's what we do.

00:25 - Johan Philippine 

We'll try and straighten it out.

00:27 - Angela Andrews 

Exactly. We try to pull back the covers a little bit.

00:30 - Johan Philippine 

And that's something that maybe compilers need a little bit. They have a certain reputation in ...

00:36 - Angela Andrews 

They have a PR problem.

00:39 - Johan Philippine 

Yeah. A little bit.

00:39 - Brent Simoneaux 

Oh. Wait, say more about that. What do you mean Angela?

00:43 - Angela Andrews 

I mean, when you say the word compiler, it sounded like you were uncertain and there is a little bit of angst there. And I think a lot of people, when they think about language, compilers, programming language compilers, it sounds like a difficult and mystical and unknown thing. What's happening when compilers do their thing? Most people don't know and maybe that not knowing is a little bit weird, a little bit scary.

01:11 - Brent Simoneaux 

Yeah.

01:12 - Johan Philippine 

Well, I spoke with some compiler engineers and believe it or not, there are also amateur compiler builders out there who do it for a hobby.

01:23 - Brent Simoneaux 

Yeah.

01:23 - Angela Andrews 

Okay.

01:24 - Johan Philippine 

And they assured me that it's not… maybe not as scary as their reputation make them out to be. So I started to ask, "Who's afraid of compilers?"

01:37 - Brent Simoneaux

This is Compiler, an original podcast from Red Hat.

01:42 - Angela Andrews 

We're your hosts.

01:43 - Brent Simoneaux 

I'm Brent Simoneaux.

01:45 - Angela Andrews 

And I'm Angela Andrews.

01:46 - Brent Simoneaux 

We're here to break down questions from the tech industry; big, small, and sometimes strange.

01:54 - Angela Andrews 

Each episode, we go out in search of answers from Red Hatters and people they’re connected to.

02:00 - Brent Simoneaux 

Today's question: who's afraid of compilers?

02:08 - Angela Andrews 

Producer Johan Philippine is here to translate.

02:12 - Johan Philippine 

I figured to start us off, it would be a good idea to go over the basics. So I spoke with Thorsten Ball and he gave me a fantastic metaphor for what compilers and their interpreter cousins actually do.

02:28 - Thorsten Ball 

Imagine if you are talking to a friend, you speak English, your friend doesn't speak English, speaks only Spanish. You have a third friend who speaks both languages. An interpreter would be, you say something, your friend listens to you and says it in Spanish. And the compiler would be your friend listening to you say something, sitting down, writing it down, translating it into Spanish, and then handing that document to your friend.

02:57 - Brent Simoneaux 

So he's making a distinction here between interpreter and compilers, right?

03:03 - Johan Philippine 

That's right. So a compiler, well, let's start with the interpreter because that's what he starts with. You have languages that are compiled, you have languages that are interpreted. The thing that they both have in common is that they take source code, which is human readable but if you were just to try to run that code on the computer, somehow the computer wouldn't know what to do with it.

03:26 - Angela Andrews 

It does not compute.

03:27 - Johan Philippine 

Exactly, it does not. It would just tell you ...

03:28 - Brent Simoneaux 

Literally-

03:30 - Johan Philippine 

... it does not compute. Yeah, literally it wouldn't know what to do with that. So you need a step to go from that high level language where you're writing out these instructions for a program and you need a step to translate that into something that a computer is actually going to understand. Now an interpreter will just take the source code and then line by line as you're writing it and running it'll just directly translate it into machine level code, which is-

03:59 - Brent Simoneaux 

Gotcha.

03:59 - Johan Philippine 

And then that's it. Right? It's like someone doing like in Thorsten's example where it's pretty much a simultaneous translation.

04:07 - Angela Andrews 

Yeah. So just so I'm tracking, the interpreter, it takes it one, I don't know, one sentence or conversation at a time.

04:16 - Johan Philippine 

Exactly.

04:16 - Angela Andrews 

And the compiler has to go through the entire thing to ...

04:21 - Brent Simoneaux 

Oh-

04:21 - Johan Philippine 

That's right-

04:23 - Angela Andrews 

... translate into what the machine could actually understand.

04:28 - Brent Simoneaux 

Got it.

04:28 - Johan Philippine 

Right. There's one more thing that's different about compilers as well is that just like you said, yes they go through the entire program and translate it all at once. And then in Thorsten's example, there is that last thing about handing a document to your friend. Right? So a compiler has a file output essentially, right? That then stays the same and isn't changeable. Whereas an interpreter just runs the code as you're writing it and running the ... There's no document output from an interpreter typically.

05:00 - Brent Simoneaux 

Oh.

05:01 - Johan Philippine 

So that's the basics of what a compiler does. There are many different types of compilers. Today, we're going to focus on the ones that translate from high level source code to low level machine code that computers can understand. Now we got the high-level explanation from Thorsten. We'll come back to Thorsten later. He's got a lot more insights for us. But to dive a little deeper into the actual steps that compilers go through, I spoke to Josh Stone and he works on the compiler for Rust. He's going to help us understand the different steps that the Rust compiler goes through.

05:44 - Josh Stone 

Right. So usually you start with a parsing phase and that just reads in the textual code and turns it into an internal data structure, usually called an abstract syntax tree. And then in something like Rust, or I think many compilers have this, there's also an internal representation where you take that syntax tree, which looks pretty similar to what the code was and transform it into some kind of internal representation.

06:15 - Johan Philippine 

All right. So quick compiler Johan break here. I'm going to translate that as best as I can. This first step he's talking about, it takes the source code that the developer has written and breaks it up into component pieces for the next step.

06:29 - Josh Stone 

So MIR is often used as a middle intermediate representation. And then that phase is where you can analyze the code and do things like type inference where you determine the types of all the variables. Rust has a borrow checker, which is where it does the analysis to make sure that references to values don't outlive the values themselves and also that exclusive borrows truly are exclusive. So you'd prevent concurrent modification. So those sort of analysis happen at this middle phase. And then from there Rust turns that MIR into LLVM IR. So LLVM is the low level virtual machine. It's a library that we use for optimization and code generation.

07:20 - Johan Philippine 

Quick pause again here. This middle section he's talking about is when the compiler takes that broken down code from the first section and starts testing it to make sure that the program's logic works, essentially that all the Is are dotted and the Ts are crossed and makes it ready for the next step.

07:42 - Josh Stone 

So the Rust compiler is just translating from its own representation into LLVM's representation and then hands off to LLVM to optimize the code so it will change things like maybe if you have X plus one plus one, it can change that to a X plus two, right? That's a very simple modification, but those things compound. So a lot of optimizations change the code into something more efficient.

08:08 - Johan Philippine 

So there are a lot of other optimizations that compilers can do and oftentimes these compilers will give the developers a choice on which to turn on or to turn off, but using them is a trade-off. It ends up with faster code, but it takes a longer time to actually compile it. Once all of that is done, we get to the final steps.

08:35 - Angela Andrews 

Wait, there's more!

08:37 - Josh Stone 

And then it finally goes through a code generation phase where it takes its own target neutral representation and turns it into specific X86 instructions or ARM instructions, whatever the target is. And then those instructions will be written to an object file and you link that into a program and then that's the thing that you can actually run.

08:59 - Johan Philippine 

All right. Mission accomplished.

09:03 - Angela Andrews 

You did it.

09:03 - Brent Simoneaux 

Yay.

09:03 - Johan Philippine 

We have our executable program, which has been translated for a specific kind of chip, right? Either the X86 or the ARM like Josh mentions or whatever it is that the compiler has been designed to translate for and you have your program.

09:22 - Brent Simoneaux 

Okay. Speaking of being human readable, I'm having a little trouble understanding. Angela, Johan, can you maybe just recap this for me real quick?

09:35 - Johan Philippine 

Sure. Angela, do you want to take a shot at it?

09:38 - Angela Andrews 

I'll take a quick shot. So again, we're starting with the source code and it breaks it down into manageable bytes, right? It looks at the variables and it decides what's the best type for what's being presented in the source code, right? And then the compiler tests to make sure that the logic is working, it's tracking. Is everything tracking here? And then it goes to the next level, which is the actual, where LLVM says, "Okay, I'm looking at your code, let's optimize it. Let's supercharge. Let's make sure this is the most efficient use as possible."

10:18 - Angela Andrews 

And then if you do a lot of that optimization, your compile time can be much longer because it's doing much more work in the beginning. Or if you want it to compile quickly, right? And then you're really not concerned about that optimization as much, you just wanted to get it done, that's where you… at the end it turns it into this instruction that the processor can actually understand. And he mentioned the X86, he mentioned the ARM and it gives you this file that will work for you because it has gone through all of these different levels. And this is what you can actually "run". So many moving parts here, but I think I was keeping up.

11:07 - Brent Simoneaux 

Yeah.

11:10 - Johan Philippine 

So we just talked about the primary function of compiler, which is translation, right? And while the task sounds pretty straightforward, once we dig into the specifics, it's a pretty involved process. Josh assured me that once you dig into it's not as difficult as it sounds, but we also heard a few hints about some of the other tasks a compiler can perform in that middle section. To learn more about that, I spoke with David Malcolm. He is a principal software engineer here at Red Hat and he works on the GCC compiler's warning and error messages.

11:44 - Brent Simoneaux 

Oh boy.

11:47 - David Malcolm 

The diagnostics and the warnings are a secondary goal of the compiler, but it's no point in generating highly optimal efficient code that does the wrong thing. I mean, I can write code that does the wrong thing and make it run extremely fast.

12:03 - Johan Philippine 

What he's saying here is that he works on the warning and error messages, right? And they play a pretty vital role, but they help with making the code run and making it run more quickly. Right. That's what the warnings are there for to help you edit and change your code a little bit so that the compiler can deal with it in a way that's more efficient.

12:30 - Angela Andrews 

And warnings are super helpful because who wants to read a horrible, unclear error message that doesn't tell you anything. You want it to be clear and helpful, so you can go back and make code that makes sense and actually works. So great error messaging is crucial.

12:51 - Johan Philippine 

Yeah.

12:52 - Angela Andrews 

Shout out to David.

12:56 - Johan Philippine 

His point though, in saying that was that the errors and the warnings that come up, they help write more efficient code and eliminate bugs. But again, that's a secondary goal of the compiler, right? The primary goal is making sure that when translating the code, nothing gets lost in translation. So it's especially important to David that the warnings and errors he develops are relevant to the code people are writing. Right. Because what they really want is for their code to run. And he wants to help them get there without getting in their way.

13:31 - David Malcolm 

What I do when I implement a new warning is, well, I said this before, I try and run the compiler against lots and lots of different projects written by different people. And with the warning turned on and see what falls out. And I was going to say, hopefully we get lots of warnings, but I don't know if I hope for that or not now I think about it. That's rather malicious in a way. Hopefully all the code is perfect in every way and there are no problems to be solved. But given reality, I mean, the reason for doing it is presumably there are mistakes that people are making, and my dream is to be able to eliminate entire categories of bug by warning about them.

14:24 - Brent Simoneaux 

How often is the code perfect Angela?

14:27 - Angela Andrews 

There is no perfect code. ‘Cause humans, I mean ...

14:33 - Brent Simoneaux 

Yeah.

14:33 - Johan Philippine 

Yeah. I would suspect that warnings and errors come up all the time.

14:39 - Angela Andrews 

And they're really important.

14:41 - Brent Simoneaux 

But I love this attitude that David has towards this. He's like, "My dream is to be able to eliminate entire categories of bugs." By warning about them and being really helpful.

14:55 - Angela Andrews 

Right.

14:55 - Johan Philippine 

Yeah. And we spoke about the different kinds of categories and also the frequency of the bugs that the compiler's going to throw at you. Right. Because depending on what options you have turned on in the compiler, you can have it try and detect a lot of things and make your program just that much more efficient and better running. Or you can have it really just detect the things that are crucial and that are not going to let your program run. And so there's a bit of a balancing act that he told me about where you have to try and think about which bugs you really want to have come front and center in front of the developer so that they actually pay attention to them, and not spam the developer with a bunch of other warnings and bugs that would be helpful for efficiency, but aren't absolutely crucial to making a program actually run. Right?

15:52 - Angela Andrews 

You have choices.

15:53 - Johan Philippine 

You have choices that affect how quickly the compiler runs and if you flood the developer with too many messages, they might stop reading them entirely and miss the ones that are helpful in terms of making the program run. So David's got this great attitude about how his role in building the compiler is maybe a secondary one, right? Where it's helpful and a lot of people use these warnings and errors to write better code, but he doesn't really mince words at all on the importance of compilers in general.

16:31 - David Malcolm 

The Linux kernel is compiled using GCC. So without the compiler there's nothing. And so I guess the kernel folks would like to say, they're the most important but I like to think we're the most important and… pistols at dawn or something. But yeah, without the compiler, your source code is just source code. You can't actually run it. And historically GCC has been the system compiler and everything was built with it.

16:58 - Brent Simoneaux 

David is throwing down.

17:00 - Angela Andrews 

He has a way with words, doesn't he?

17:03 - Johan Philippine 

Yeah. He's got some fighting words and he is ready to back them up.

17:06 - Brent Simoneaux 

I love it.

17:07 - Angela Andrews 

But he has a point though.

17:09 - Johan Philippine 

Yes. He has a point. Without compilers, you don't have programming, you don't have a lot of things. On the other hand, without a kernel you also ...

17:19 - Brent Simoneaux 

Yeah.

17:20 - Johan Philippine 

So I don't know about which one's more important. I'm not going to judge that but ...

17:25 - Angela Andrews 

Yeah, don't get into that duel.

17:26 - Johan Philippine 

Perhaps they're both equally important? Question mark? I don't know.

17:32 - Angela Andrews 

Let's leave it there.

17:32 - Johan Philippine 

Let's leave it to them to decide between themselves and we can comment about it later. So we've talked about the primary function of a compiler, and we've talked a little bit about the internal workings and some of the things that happen in the middle of the compiler with optimizations and warning codes and things like that.

17:56 - Johan Philippine 

I want to move on next to what you can gain from learning how to build your own compiler, which it turns out isn't necessarily as difficult as it may seem. I spoke with Thorsten Ball. We heard from him a little bit at the top of the show. He's a self-taught programmer who decided it would be fun to learn about compilers, which he had heard was allegedly, one of the most difficult aspects of computer science.

18:22 - Angela Andrews 

Can I ask what was his background before he decided to start toying around with compilers?

18:28 - Johan Philippine 

He was a self-described philosophy student dropout. He grew up playing around with websites, building his own websites and things like that. A few years ago, he started toying around with programming, ended up becoming a programmer, self-taught, and started tinkering with compilers as a side project while he worked as a developer. As he started learning about compilers, he found it to be less daunting than their reputation, but he had a lot of trouble finding modern and non-dense resources, to put it diplomatically. He found it difficult to learn about compilers in a way that wasn't a very thick, math, heavy textbook, and to find instructions that were actually both understandable, approachable, but also thorough enough to go from zero to a working compiler with code in front of you. So ...

19:30 - Angela Andrews 

Approachable, but thorough. Okay.

19:32 - Johan Philippine 

Yeah, exactly. He ended up writing a couple books. The first book was about his experience, not about his experience, but it ended up being a resource for how to build an interpreter, which he wrote by taking notes about his own experiences, learning how to do it and he wanted to provide that resource for people who wanted to learn about compilers, but didn't want to dig into a doorstop of a textbook.

19:57 - Johan Philippine 

And then he followed up that book with the book about compilers. So his end goal became to show that compilers aren't in their basic form anyways, as complex as their reputation. And he told me about what he learned beyond just how compilers work, what he learned about computing.

20:17 - Thorsten Ball 

If you write a compiler that, say, translates a high level language like JavaScript or something like that, a really high level language where you don't have to worry about memory management, stuff like this, into X86 assembly code, you learn how a computer works and how most of the software that runs on your computer works. You find out what goes into an executable file, what is actually in there. Which data goes into it and which data does not go into it. You will learn what goes into memory, when you start your executable, how your operating system starts it and where it puts certain stuff and how it organizes things into memory. You learn how the computer actually executes things because you will get in touch with assembly instructions or machine instructions. That means, for example, spoiler, a computer doesn't have ‘if’ conditionals, right? So if/else in assembly language, there might be in some assemblers, but what you usually do is you compare two values or you test some values. And then based on that result, you do a jump or a go-to and jump to a different location of source code.

21:33 - Johan Philippine 

Those kinds of insights might help you write better code, right? Because if you know what's going on under the hood, so to speak, you can write your code to be better tailored to those lower level instructions. So this is something that Josh actually told me a bit about. That compilers will run more efficiently, if you write your source code in a way that it can process more easily, you're less likely to get errors if you design your code to the compiler's processes.

22:03 - Thorsten Ball 

So you realize how your high level language is translatable down to a lower level language without losing any meaning. So you will see how you can use lower level constructs to do the same thing that you can do in higher level constructs, right? So concrete example in JavaScript or saying go or any language that has some higher level abstractions, you have something like a switch statement, right? A computer doesn't have a switch statement, a computer doesn't have switch instructions.

22:35 - Thorsten Ball 

It's made up of these tests and if this is true, then jump to here, test, if this is true, then jump to here, blah, blah, blah. And in fact, most compilers might optimize some of that away because they already know that it can be true or it can be false or something like this.

22:51 - Johan Philippine 

In the end, most languages are built using those same blocks, right? Because the computer itself doesn't change, but all these different languages that are written and implemented in different ways end up having to use those same blocks to function.

23:08 - Thorsten Ball 

Then at the end, of course, when you have built your compiler or while you're building it, you will suddenly understand how 80% of all the other programming languages work. And you will maybe gain this little bit of appreciation or understanding for how all of the software right now that we are using, or that we're building every day is built on this tower of abstractions that we pulled ourselves up on, where the languages we're using today, under the hood, use languages that were built 10 years ago that those use languages that were built 20 years ago and stuff like this. And it all in the end comes down to machine instructions. But we are so high on this tower of abstractions that we barely ever notice. And that's quite fascinating.

24:02 - Brent Simoneaux 

What does he mean by that Angela? The tower of abstractions climbing? What does that mean?

24:09 - Angela Andrews 

So understanding that programming has been around for a long time. I'm talking '50s. Maybe around the '50s when the first programming languages, could be even earlier, but I'm thinking of specific languages when I say around the '50s. There's a language and then a language comes along after it, built on top of that language, right? And then that gets to be used and it gets popular and it waxes and wanes. And then another programming language may be based off of that one. And again, that's where the abstraction starts to happen and unfold where he says, "There were languages built 10 years ago, but they're based on languages that were built 20 years before." You know what I mean?

24:55 - Brent Simoneaux 

Yeah.

24:55 - Angela Andrews 

So that it becomes much more abstracted, the higher you go. So we're so high on that tower as he puts it. Yeah.

25:05 - Brent Simoneaux 

Anything abstracted from the metal or like from the ...

25:09 - Angela Andrews 

Yes, we're so abstracted from the ones-

25:13 - Brent Simoneaux 

The hardware.

25:14 - Angela Andrews 

... and zeros of-

25:14 - Johan Philippine 

Yes, exactly.

25:15 - Angela Andrews 

... what the machine actually understands. Learning these compilers gives you so much insight as to how all the languages tend to work and how they tend to do exactly what they were built to do.

25:31 - Johan Philippine 

Yeah. Pun not intended here, but to build on that a little bit more, let's think a little bit about what the end results of these increasing abstractions looks like. Right. We start with these shared building blocks that a lot of these languages have in common. We build these languages using those blocks, that introduce new features and new functions, and they diverge a bit, right? They're different from each other. And then new languages are built on top of those and those divergences get wider. And every time you do that, you get that much farther from those original building blocks that the machine can understand.

26:12 - Brent Simoneaux 

Johan, you have been studying compilers. You have been in the books, you've been talking to people. What should we do with all of this information? Should we all go try to make a compiler? What do we do with all this?

26:30 - Johan Philippine 

I don't know that everyone should do it, right? But if it's something that you're interested in, but you were afraid to tackle it because you've heard that you need to know a lot of advanced mathematics or that the code can be really dense, all of those things, they aren't necessarily true, right? So Thorsten wrote a couple books, one about how to write an interpreter and another that follows on writing a compiler. That interpreter is only about 3000 lines long and the compiler adds on a few thousand more lines, which might sound like a lot to a person who's never programmed before. But when you compare it to something like the Rust compiler that Josh Stone works on, that one is about half a million lines of code long.

27:16 - Angela Andrews 

So in this podcast, it sounds as if what you assumed about compilers is that they are hard and unapproachable, you might want to reconsider because you too may be able to write your own compiler. If you take it apart piece by piece, work small and then build upon that.

27:38 - Johan Philippine 

Yeah. And hopefully that helps you understand how the programming languages you use are put together. How they run and how they turn your source code into something that a machine can actually run.

27:53 - Angela Andrews 

Exactly. And understanding where those error messages are coming from, making it make sense, that is so important.

28:02 - Johan Philippine 

Yeah. When you know how the compiler works and where it breaks down, that helps you understand a little bit better where the error messages are coming from.

28:12 - Angela Andrews 

Exactly. And that's how you can write much better code. Hey, let us know if we've unmasked the wizard behind the compiler curtain. Does this compute? We want to know. Share your thoughts with us. You can tweet us @RedHat on Twitter, or just use the hashtag #CompilerPodcast. We would really love to hear what you thought about the show. And that does it for the compiler episode of Compiler.

28:41 - Brent Simoneaux 

Today's episode was produced by Johan Philippine and Caroline Creaghead. Victoria Lawton makes sure this show passes all the compiler checks.

28:51 - Angela Andrews 

She sure does. Our audio engineer is Christian Proham. Special thanks to Shawn Cole. Our theme song was composed by Mary Ancheta.

29:02 - Brent Simoneaux 

Thank you to Josh Stone, David Malcolm, Thorsten Ball.

29:06 - Angela Andrews 

Our audio team includes Leigh Day, Laura Barnes, Stephanie Wonderlick, Mike Esser, Claire Allison, Nick Burns, Aaron Williamson, Karen King, Boo Boo Howse, Rachel Ertel, Mike Compton, Ocean Matthews, and Laura Walters.

29:24 - Brent Simoneaux 

If you liked today's show, please follow us. You can rate the show, leave us a review, share it with someone you know. It really does help.

29:33 - Angela Andrews 

We would love to hear from you. And we're so glad you joined us for this episode. Take care everybody.

29:39 - Brent Simoneaux 

See you next time.

Featured guests

Thorsten Ball

Josh Stone

David Malcolm

We were so excited about our episode on licenses—so we have a special project that we want to share. Our Compiler theme song is now under a Creative Commons license. Add your own special touch to our theme. Remixes have the chance to be featured on a future episode.