Jay Taylor's notes
back to listing indexJohn Carmack discusses the art and science of software engineering | Bits and Behavior
[web search]John Carmack discusses the art and science of software engineering
I’m not really a hard core gamer anymore, but my fascination with programming did begin with video games (and specifically, rendering algorithms). So when I saw John Carmack’s 2012 QuakeCon keynote show up in my feed, I thought I’d listen to a bit of it and learn a bit about the state of game design and development.
What I heard instead was a hacker’s hacker talk about his recent realization that software engineering is actually a social science. Across 10 minutes, he covers many human aspects of developer mistakes, programming language design, static analysis, code reviews, developer training, and cost/benefit analyses. The emphasis throughout is mine (and I also transcribed this, so I apologize for any mistakes).
In trying to make the games faster, which has to be our priority going forward, we’ve made a lot of mistakes already with Doom 4, a lot of it is water under the bridge, but prioritizing that can help us get the games done faster, just has to be where we go. Because we just can’t do this going, you know, six more years, whatever, between games.
On the software development side, you know there was an interesting thing at E3, one of the interviews I gave, I had mentioned something about how, you I’ve been learning a whole lot, and I’m a better programmer now than I was a year ago and the interviewer expressed a lot of surprise at that, you know after 20 years and going through all of this that you’d have it all figured out by now, but I actually have been learning quite a bit about software development, both on the personal craftsman level but also paying more attention by what it means on the team dynamics side of things. And this is something I probably avoided looking at squarely for years because, it’s nice to think of myself as a scientist engineer sort, dealing in these things that are abstract or provable or objective on there and there.
In reality in computer science, just about the only thing that’s really science is when you’re talking about algorithms. And optimization is an engineering. But those don’t actually occupy that much of the total time spent programming. You know, we have a few programmers that spend a lot of time on optimizing and some of the selecting of algorithms on there, but 90% of the programmers are doing programming work to make things happen. And when I start to look at what’s really happening in all of these, there really is no science and engineering and objectivity to most of these tasks. You know, one of the programmers actually says that he does a lot of monkey programming—you know beating on things and making stuff happen. And I, you know we like to think that we can be smart engineers about this, that there are objective ways to make good software, but as I’ve been looking at this more and more, it’s been striking to me how much that really isn’t the case.
Aside from these that we can measure, that we can measure and reproduce, which is the essence of science to be able to measure something, reproduce it, make an estimation and test that, and we get that on optimization and algorithms there, but everything else that we do, really has nothing to do with that. It’s about social interactions between the programmers or even between yourself spread over time. And it’s nice to think where, you know we talk about functional programming and lambda calculus and monads and this sounds all nice and sciency, but it really doesn’t affect what you do in software engineering there, these are all best practices, and these are things that have shown to be helpful in the past, but really are only helpful when people are making certain classes of mistakes. Anything that I can do in a pure functional language, you know you take your most restrictive scientific oriented code base on there, in the end of course it all comes down to assembly language, but you could exactly the same thing in BASIC or any other language that you wanted to.
One of the things that’s also fed into that is my older son’s starting to learn how to program now. I actually tossed around the thought of should I maybe have him try to learn Haskell as a 7 year old or something and I decided not to, that I, you know, I don’t think that I’m a good enough Haskell programmer to want to instruct anybody in anything, but as I start thinking about how somebody learns programming from really ground zero, it was opening my eyes a little bit to how much we take for granted in the software engineering community, really is just layers of artifice upon top a core fundamental thing. Even when you go back to structured programming, whether it’s while loops and for loops and stuff, at the bottom when I’m sitting thinking how do you explain programming, what does a computer do, it’s really all the way back to flow charts. You do this, if this you do that, if not you do that. And, even trying to explain why do you do a for loop or what’s this while loop on here, these are all conventions that help software engineering in the large when you’re dealing with mistakes that people make. But they’re not fundamental about what the computer’s doing. All of these are things that are just trying to help people not make mistakes that they’re commonly making.
One of the things that’s been driven home extremely hard is that programmers are making mistakes all the time and constantly. I talked a lot last year about the work that we’ve done with static analysis and trying to run all of our code through static analysis and get it to run squeaky clean through all of these things and it turns up hundreds and hundreds, even thousands of issues. Now its great when you wind up with something that says, now clearly this is a bug, you made a mistake here, this is a bug, and you can point that out to everyone. And everyone will agree, okay, I won’t do that next time. But the problem is that the best of intentions really don’t matter. If something can syntactically be entered incorrectly, it eventually will be. And that’s one of the reasons why I’ve gotten very big on the static analysis, I would like to be able to enable even more restrictive subsets of languages and restrict programmers even more because we make mistakes constantly.
One of the things that I started doing relatively recently is actually doing a daily code review where I look through the checkins and just try to find something educational to talk about to the team. And I annotate a little bit of code and say, well actually this is a bug discovered from code review, but a lot of it is just, favor doing it this way because it’s going to be clearer, it will cause less problems in other cases, and it ruffled, there were a few people that got ruffled feathers early on about that with the kind of broadcast nature of it, but I think that everybody is appreciating the process on that now. That’s one of those scalability issues where there’s clearly no way I can do individual code reviews with everyone all the time, it takes a lot of time to even just scan through what everyone is doing. Being able to point out something that somebody else did and say well, everybody should pay attention to this, that has some real value in it. And as long as the team is agreeable to that, I think that’s been a very positive thing.
But what happens in some cases, where you’re arguing a point where let’s say we should put const on your function parameters or something, that’s hard to make an objective call on, where lots of stuff we can say, this indirection is a cache list, that’s going to cost us, it’s objective, you can measure it, there’s really no arguing with it, but so many of these other things are sort of style issues, where I can say, you know, over the years, I’ve seen this cause a lot problems, but a lot of people will just say, I’ve never seen that problem. That’s not a problem for me, or I don’t make those mistakes. So it has been really good to be able to point out commonly on here, this is the mistake caused by this.
But as I’ve been doing this more and more and thinking about it, that sense that this isn’t science, this is just trying to deal with all of our human frailties on it, and I wish there were better ways to do this. You know we all want to become better developers and it will help us make better products, do a better job with whatever we’re doing, but the fact that it’s coming down to training dozens of people to do things in a consistent way, knowing that we have programmer turnover as people come and go, new people coming and looking at the code base and not understanding the conventions, and there are clearly better and worse ways of doing things but it’s frustratingly difficult to quantify.
That’s something that I’m spending more and more time looking at. I read NASA’s software engineering laboratory reports and I can’t seem to get any real value out of a lot of those things. The things that have been valuable have been automated things, things that don’t require a human to have some analysis, have some evaluation of it, but just say, enforced or not enforced. And I think that that’s where really where things need to go as larger and larger software gets developed. And it is striking the scale of what we’re doing now. If you look back at the NASA reports and the scale of things and they considered large code bases to be things with three or four hundred thousand lines of code. And we have far more than that in our game engines now. It’s kind of fun to think that the game engines, things that we’re playing games on, have more sophisticated software than certainly the things that launch people to the moon and back and flew the shuttle, ran Skylab, run the space station, all of these massive projects on there are really outdone in complexity by any number of major game engine projects.
And the answer is as far as I can tell really isn’t out there. With the NASA style development process, they can deliver very very low bug rates, but it’s at a very very low productivity rate. And one of the things that you wind up doing in so many cases is cost benefit analyses, where you have to say, well we could be perfect, but then we’ll have the wrong product and it will be too late. Or we can be really fast and loose, we can go ahead and just be sloppy but we’ll get something really cool happening soon. And this is one of those areas where there’s clearly right tools for the right job, but what happens is you make something really cool really fast and then you live with it for years and you suffer over and over with that. And that’s something that I still don’t think that we do the best job at.
We know our code is living for, realistically, we’re looking at a decade. I tell people that there’s a good chance that whatever you’re writing here, if it’s not extremely game specific, may well exist a decade from now and it will have hundreds of programmers, looking at the code, using it, interacting with it in some way, and that’s quite a burden. I do think that it’s just and right to impose pretty severe restrictions on what we’ll let past analysis and what we’ll let into it, but there are large scale issues at the software API design levels and figuring out things there, that are artistic, that are craftsman like on there. And I wish that there were more quantifiable things to say about that. And I am spending a lot of time on this as we go forward.