|Figure 5: Quake's beam tree was composed of 3-D wedges,
or beams, projecting out from the viewpoint to polygon edges.
In John’s design, the beam tree
started out consisting of a single beam describing the frustum; everything outside
that beam was marked solid (so nothing would draw there), and the inside of
the beam was marked empty. As each new polygon was reached while walking the
world BSP tree front to back, that polygon was converted to a beam by running
planes from its edges through the viewpoint, and any part of the beam that intersected
empty beams in the beam tree was considered drawable and added to the beam tree
as a solid beam. This continued until either there were no more polygons or
the beam tree became entirely solid. Once the beam tree was completed, the
visible portions of the polygons that had contributed to the beam tree were
The advantage to working with a
3-D beam tree, rather than a 2-D region, is that determining which side of a
beam plane a polygon vertex is on involves only checking the sign of the dot
product of the ray to the vertex and the plane normal, because all beam planes
run through the origin (the viewpoint). Also, because a beam plane is completely
described by a single normal, generating a beam from a polygon edge requires
only a cross-product of the edge and a ray from the edge to the viewpoint.
Finally, bounding spheres of BSP nodes can be used to do the aforementioned
bulk culling to the frustum.
The early-out feature of the beam
tree--stopping when the beam tree becomes solid--seems appealing, because it
appears to cap worst-case performance. Unfortunately, there are still scenes
where it’s possible to see all the way to the sky or the back wall of the world,
so in the worst case, all polygons in the frustum will still have to be tested
against the beam tree. Similar problems can arise from tiny cracks due to numeric
precision limitations. Beam tree clipping is fairly time-consuming, and in
scenes with long view distances, such as views across the top of a level, the
total cost of beam processing slowed Quake’s frame rate to a crawl. So, in
the end, the beam-tree approach proved to suffer from much the same malady as
the painter’s algorithm: The worst case was much worse than the average case,
and it didn’t scale well with increasing level complexity.
3-D engine du jour
Once the beam tree was working,
John relentlessly worked at speeding up the 3-D engine, always trying to improve
the design, rather than tweaking the implementation. At least once a week,
and often every day, he would walk into my office and say “Last night I couldn’t
get to sleep, so I was thinking...” and I’d know that I was about to get my
mind stretched yet again. John tried many ways to improve the beam tree, with
some success, but more interesting was the profusion of wildly different approaches
that he generated, some of which were merely discussed, others of which were
implemented in overnight or weekend-long bursts of coding, in both cases ultimately
discarded or further evolved when they turned out not to meet the design criteria
well enough. Here are some of those approaches, presented in minimal detail
in the hopes that, like Tom Wilson with the Paradise FIFO, your imagination
will be sparked.
Subdividing raycast: Rays are cast
in an 8x8 screen-pixel grid; this is a highly efficient operation because the
first intersection with a surface can be found by simply clipping the ray into
the BSP tree, starting at the viewpoint, until a solid leaf is reached. If
adjacent rays don’t hit the same surface, then a ray is cast halfway between,
and so on until all adjacent rays either hit the same surface or are on adjacent
pixels; then the block around each ray is drawn from the polygon that was hit.
This scales very well, being limited by the number of pixels, with no overdraw.
The problem is dropouts; it’s quite possible for small polygons to fall between
rays and vanish.
Vertex-free surfaces: The world
is represented by a set of surface planes. The polygons are implicit in the
plane intersections, and are extracted from the planes as a final step before
drawing. This makes for fast clipping and a very small data set (planes are
far more compact than polygons), but it’s time-consuming to extract polygons
Draw-buffer: Like a z-buffer, but
with 1 bit per pixel, indicating whether the pixel has been drawn yet. This
eliminates overdraw, but at the cost of an inner-loop buffer test, extra writes
and cache misses, and, worst of all, considerable complexity. Variations are
testing the draw-buffer a byte at a time and completely skipping fully-occluded
bytes, or branching off each draw-buffer byte to one of 256 unrolled inner loops
for drawing 0-8 pixels, in the process possibly taking advantage of the ability
of the x86 to do the perspective floating-point divide in parallel while 8 pixels
Span-based drawing: Polygons are
rasterized into spans, which are added to a global span list and clipped against
that list so that only the nearest span at each pixel remains. Little sorting
is needed with front-to-back walking, because if there’s any overlap, the span
already in the list is nearer. This eliminates overdraw, but at the cost of
a lot of span arithmetic; also, every polygon still has to be turned into spans.
Portals: the holes where polygons
are missing on surfaces are tracked, because it’s only through such portals
that line-of-sight can extend. Drawing goes front-to-back, and when a portal
is encountered, polygons and portals behind it are clipped to its limits, until
no polygons or portals remain visible. Applied recursively, this allows drawing
only the visible portions of visible polygons, but at the cost of a considerable
amount of portal clipping.
In the end, John decided that the
beam tree was a sort of second-order structure, reflecting information already
implicitly contained in the world BSP tree, so he tackled the problem of extracting
visibility information directly from the world BSP tree. He spent a week on
this, as a byproduct devising a perfect DOOM (2-D) visibility architecture,
whereby a single, linear walk of a DOOM BSP tree produces zero-overdraw 2-D
visibility. Doing the same in 3-D turned out to be a much more complex problem,
though, and by the end of the week John was frustrated by the increasing complexity
and persistent glitches in the visibility code. Although the direct-BSP approach
was getting closer to working, it was taking more and more tweaking, and a simple,
clean design didn’t seem to be falling out. When I left work one Friday, John
was preparing to try to get the direct-BSP approach working properly over the
When I came in on Monday, John had
the look of a man who had broken through to the other side--and also the look
of a man who hadn’t had much sleep. He had worked all weekend on the direct-BSP
approach, and had gotten it working reasonably well, with insights into how
to finish it off. At 3:30 AM Monday morning, as he lay in bed, thinking about
portals, he thought of precalculating and storing in each leaf a list of all
leaves visible from that leaf, and then at runtime just drawing the visible
leaves back-to-front for whatever leaf the viewpoint happens to be in, ignoring
all other leaves entirely.
Size was a concern; initially, a
raw, uncompressed potentially visible set (PVS) was several megabytes in size.
However, the PVS could be stored as a bit vector, with 1 bit per leaf, a structure
that shrunk a great deal with simple zero-byte compression. Those steps, along
with changing the BSP heuristic to generate fewer leaves (contrary to what I
said a few months back, choosing as the next splitter the polygon that splits
the fewest other polygons is clearly the best heuristic, based on the latest
data) and sealing the outside of the levels so the BSPer can remove the outside
surfaces, which can never be seen, eventually brought the PVS down to about
20 Kb for a good-size level.
In exchange for that 20 Kb, culling
leaves outside the frustum is speeded up (because only leaves in the PVS are
considered), and culling inside the frustum costs nothing more than a little
overdraw (the PVS for a leaf includes all leaves visible from anywhere in the
leaf, so some overdraw, typically on the order of 50% but ranging up to 150%,
generally occurs). Better yet, precalculating the PVS results in a leveling
of performance; worst case is no longer much worse than best case, because there’s
no longer extra VSD processing--just more polygons and perhaps some extra overdraw--associated
with complex scenes. The first time John showed me his working prototype, I
went to the most complex scene I knew of, a place where the frame rate used
to grind down into the single digits, and spun around smoothly, with no perceptible
John says precalculating the PVS
was a logical evolution of the approaches he had been considering, that there
was no moment when he said “Eureka!”. Nonetheless, it was clearly a breakthrough
to a brand-new, superior design, a design that, together with a still-in-development
sorted-edge rasterizer that completely eliminates overdraw, comes remarkably
close to meeting the “perfect-world” specifications we laid out at the start.
Simplify, and keep on trying new
What does it all mean? Exactly
what I said up front: Simplify, and keep trying new things. The precalculated
PVS is simpler than any of the other schemes that had been considered (although
precalculating the PVS is an interesting task that I’ll discuss another time).
In fact, at runtime the precalculated PVS is just a constrained version of the
painter’s algorithm. Does that mean it’s not particularly profound?
Not at all. All really great designs
seem simple and even obvious--once they’ve been designed. But the process of
getting there requires incredible persistence, and a willingness to try lots
of different ideas until the right one falls into place, as happened here.
My friend Chris Hecker has a theory
that all approaches work out to the same thing in the end, since they all reflect
the same underlying state and functionality. In terms of underlying theory,
I’ve found that to be true; whether you do perspective texture mapping with
a divide or with incremental hyperbolic calculations, the numbers do exactly
the same thing. When it comes to implementation, however, my experience is
that simply time-shifting an approach, or matching hardware capabilities better,
or caching can make an astonishing difference. My friend Terje Mathisen likes
to say that “almost all programming can be viewed as an exercise in caching,”
and that’s exactly what John did. No matter how fast he made his VSD calculations,
they could never be as fast as precalculating and looking up the visibility,
and his most inspired move was to yank himself out of the “faster code” mindset
and realize that it was in fact possible to precalculate (in effect, cache)
and look up the PVS.
The hardest thing in the world is
to step outside a familiar, pretty good solution to a difficult problem and
look for a different, better solution. The best ways I know to do that are
to keep trying new, wacky things, and always, always, always try to simplify.
One of John’s goals is to have fewer lines of code in each 3-D game than in
the previous game, on the assumption that as he learns more, he should be able
to do things better with less code.
So far, it seems to have worked
out pretty well for him.
Learn now, pay forward
There’s one other thing I’d like
to mention before I close up shop for this month. As far back as I can remember,
DDJ has epitomized the attitude that sharing programming information
is A Good Thing. I know a lot of programmers who were able to leap ahead in
their development because of Hendrix’s Tiny C, or Stevens’ D-Flat, or simply
by browsing through DDJ’s annual collections. (Me, for one.) Most companies
understandably view sharing information in a very different way, as potential
profit lost--but that’s what makes DDJ so valuable to the programming
It is in that spirit that id Software
is allowing me to describe in these pages how Quake works, even before Quake
has shipped. That’s also why id has placed the full source code for Wolfenstein
3D on ftp.idsoftware.com/idstuff/source; you can’t just recompile the code and
sell it, but you can learn how a full-blown, successful game works; check wolfsrc.txt
in the above-mentioned directory for details on how the code may be used.
So remember, when it’s legally possible,
sharing information benefits us all in the long run. You can pay forward the
debt for the information you gain here and elsewhere by sharing what you know
whenever you can, by writing an article or book or posting on the Net. None
of us learns in a vacuum; we all stand on the shoulders of giants such as Wirth
and Knuth and thousands of others. Lend your shoulders to building the future!
Foley, James D., et al., Computer
Graphics: Principles and Practice, Addison Wesley, 1990, ISBN 0-201-12110-7
(beams, BSP trees, VSD).
Teller, Seth, Visibility Computations
in Densely Occluded Polyhedral Environments (dissertation), available on
along with several other papers relevant
to visibility determination.
Teller, Seth, Visibility Preprocessing
for Interactive Walkthroughs, SIGGRAPH 91 proceedings, pp. 61-69.