Was The Use of BSP in Doom Really a Stroke of Genius?06.01.2020
The pinnacle of technology at the time.
In 1993, id Software released the first-person shooter Doom, which quickly turned into a phenomenon. Today it is considered to be one of the games that had the greatest impact in history.
Ten years after the release of Doom, in 2003, journalist David Kushner published a book about id Software called Masters of Doom, which has since become a canonical description of the process of creating Doom. I read Masters of Doom a few years ago and can’t remember much of it, but one story from the book about the lead programmer, John Carmack, remains in my memory. My description will not be quite accurate (see below for details), but in short, at an early stage of development doom Carmack realized that he wrote for the game 3D-renderer when rendering individual levels began to slow down. This was unacceptable, because Doom was supposed to be an active and even frenzied game. Realizing that the renderer problem was so fundamental that he would have to look for a more optimal rendering algorithm, Carmack began reading research papers. As a result, he implemented a technique called “binary space partitioning” (BSP), which had never been used in video games before, and significantly accelerated the doom engine in this way.
This story about how Carmack applied cutting-edge scientific research to video games has always impressed me. I think that’s what made Carmack such a legendary figure. He deserves to be known as an archetypal genius video game programmer for a variety of reasons, but as the main one I always remember the episode with reading scientific articles and binary space splitting.
Obviously, this story is impressive, because the term “binary space partitioning” seems to be something difficult not only to implement, but even to understand. For a long time I assumed that Carmack had made an intellectual leap forward, but since I didn’t know what binary space partitioning was, and how new this technique was when Carmack decided to use it, I wasn’t entirely sure. How ingenious was the addition of binary space partitioning in Doom on a scale from Homer Simpson to albert Einstein?
I also wondered where BSP came from and how the idea got to Carmack. Therefore, this post will focus not only on John Carmack and Doom, but also on the history of the data structure of the “binary space partitioning tree” (or BSP tree). It turned out that BSP tree, like many other aspects of computing science, has its origins in research conducted for the military.
Yes, that’s right: E1M1, the first level of Doom, appeared thanks to the US air force.
BSP-tree is a solution to one of the most difficult problems in computer graphics. In order to render a three-dimensional scene, the renderer must determine what is visible and what is not visible from the current point. It’s not particularly difficult if you have a lot of time, but a self-respecting real-time game engine should detect visible and invisible parts of the world at least 30 times per second.
This problem is often referred to as the visible surface determination (VSD) problem. Programmer Michael Abrash, who worked with Carmack on Quake (the next game after doom id Software), wrote about the VSD problem in his famous book Graphics Programming Black Book:
I want to talk about the most difficult, in my opinion, the problem of 3D graphics: the definition of visible surfaces (drawing the desired surface in each pixel) and its close relative – the problem of clipping (culling) (the fastest possible rejection of invisible polygons to speed up the definition of visible surfaces). For the sake of brevity, I will use the abbreviation VSD to denote both the definition of visible surfaces and the clipping.
Why do I consider VSD the most difficult 3D task? Although rasterization problems, such as texture mapping, are also terrific and important tasks, they are tasks of fairly finite scale, which are shifted with the advent of 3D accelerators to hardware; in addition, they are scaled only when the screen resolution is increased, which is quite tolerable.
In contrast, VSD is an unlimited task, and now dozens of solutions are used to solve it. More importantly, VSD performance in a naive solution scales in direct proportion to the complexity of the scene, which usually increases as a square or cubic function, so it quickly becomes a limiting factor in rendering realistic worlds.
Abrash wrote about the complexity of the VSD task in the late ‘ 90s, a few years after Doom proved that ordinary people want to be able to play graphically heavy games on their home computers. In the early 90’s, when id Software first started publishing games, they had to work effectively on non-dedicated computers: home machines were designed to work with text, spreadsheets, and other similar applications. To all achieve this goal, the company had to approach the work with fiction, especially in the case of several 3D games published by id Software before Doom. In these games, the design of all levels was limited in such a way as to simplify the solution of the VSD problem.
For example, in Wolfenstein, a 3D game that id Software released just before Doom, each level consisted of walls aligned along axes. In other words, there could be North/South walls or East/West walls in the Wolfenstein universe, and no other. In addition, the walls could be placed at fixed distances in the grid — all the corridors have a width of either one grid cell, or two, etc., but never 2.5 cells. Although this meant that the id Software team could create levels that looked almost identical, this restriction made it much easier for Carmack to write a renderer for Wolfenstein.
The Wolfenstein renderer solved the VSD problem by moving rays (ray marching) into the virtual world from the screen. Ray renderers are usually raycasting renderers-they are often slow because solving the VSD problem in raycaster requires finding the first intersection between the ray and some object in the world, which requires a lot of calculations. But since in Wolfenstein all the walls are lined up in a grid, the only place where the beam can cross the wall will be the grid lines. Therefore, it is sufficient for the renderer to check each of these intersection points. If the renderer starts by checking the intersection point closest to the player’s point of view, then checks the second one in proximity, and so on, and finishes when it encounters the first wall, then the VSD problem is solved in the most trivial way. The beam simply moved forward from each pixel until it collided with something, which is very low-cost in terms of processor clock speeds. And since all the walls are the same height, we just need to emit rays for each column of pixels.
This simplification of rendering made Wolfenstein fast enough to work on weak home PCs of the era when there were no specialized graphics cards. But this approach would not work in Doom, because the id team decided that its new game will have new elements such as diagonal walls, stairs and ceilings with different heights. Ray marching was no longer suitable, so Carmack wrote a different type of renderer. The Wolfenstein renderer, where the beam was used for each column of pixels, was repelled from the image, and the Doom renderer had to be repelled from objects. This meant that instead of traversing the screen pixels and determining their color, the Doom renderer had to iterate over the objects in the scene and project each of them in turn onto the screen.
In such a renderer, a simple way to solve the VSD problem is to use the z-buffer. Every time we project an object onto the screen, a check is performed for each pixel we want to draw. If the part of the object to be drawn is closer to the player than the object already drawn in the pixel, then we can rewrite its information. Otherwise, you need to leave the pixel unchanged. This approach is simple, but the z-buffer requires a lot of memory, and the renderer can still spend a bunch of CPU cycles projecting level geometry that the player won’t see.
In the early 1990s, the z-buffer solution had another drawback: on IBM-compatible PCs that used a video adapter system called VGA, writing to the output frame buffer was a costly operation. Therefore, the time spent on rendering pixels that would then simply be overwritten greatly reduced the renderer’s performance.
Because it was so expensive to write to the frame buffer, the ideal renderer would start by drawing the objects closest to the player, then the objects immediately behind them, and so on until the recording to each pixel of the screen was complete. At this point, the renderer should have realized that it was time to stop, thus saving all the time he could spend studying distant objects that the player could not see. But ordering the scene objects in this way, from the nearest to the farthest, is equivalent to solving the VSD problem. The question again arises: what can a player see?
At first, Carmack tried to solve this problem by relying on the doom level scheme. His renderer would start by drawing the walls of the room the player was in, and then fill in adjacent rooms to draw the walls in those rooms that could be seen from the current room. If each room was convex, this would solve the VSD problem. Non-convex rooms could be divided into convex “sectors”. You can see how this rendering technique could look in a strong slowdown in the video shown above, where a YouTuber user with the nickname Bisqwit demonstrates his own renderer working according to the same General algorithm. This algorithm was successfully used in the game Duke Nukem 3D, released three years after Doom, when the processors became more powerful. But in 1993, the Doom renderer using this algorithm had problems with complex levels on the then available hardware. This was especially evident when the sectors were embedded in each other, and this was the only way to create circular staircases, for example. Circular staircases required multiple recursive descents into the already rendered sector, drastically reducing the speed of the engine.
Around the same time that the id team realized that the Doom engine might be too slow, id Software was asked to port Wolfenstein 3D to Super Nintendo. The SNES was even less powerful than the IBM-compatible PCs of the time, and it turned out that the Wolfenstein renderer with the ray-marching technique, despite its simplicity, did not run on Super Nintendo hardware with sufficient speed. So Carmack started looking for a better algorithm. In fact, it was for the Wolfenstein port on Super Nintendo that Carmack first explored and implemented binary space partitioning. In Wolfenstein, it was quite simple, because all the walls were parallel to the axes; in Doom, it will be more difficult. But Carmack realized that BSP trees would solve speed problems in Doom as well.
Binary space partitioning
Binary space partitioning simplifies the VSD task by pre-partitioning the 3D scene. For now, you just need to understand what is useful in dividing the scene: if you draw a line (which is actually a plane in 3D) across the entire scene, knowing which side of the line the player or camera is on, then we will also know that nothing on the other side of the line can interfere with objects on the side of the line where the camera is. If we repeat the process many times, we get a 3D scene divided into many sections. This will not be an improvement over the original scene, except that we now know more about how different parts of the scene can overlap.
The first to write about such a division of the 3D scene were researchers trying to find out for the US air force whether computer graphics are progressive enough for use in flight simulators. They published their findings in a 1969 report entitled ” Research on the use of computer-generated images in visual simulation.” The report concluded that computer graphics could be used to train pilots; at the same time, the researchers warned that the implementation of the system would be complicated by the VSD task:
One of the most important tasks that will need to be solved when calculating images in real time is the task of priority, or hidden lines. In our everyday visual perception of the world around us, this task is solved with trivial simplicity by nature itself; the point of an opaque object overlaps all other points that lie along the same line of sight and are further away. In the case of a computer, this task is very difficult. The amount of computing required to determine priority generally increases exponentially with increasing complexity of environments, and soon surpasses the computational load associated with searching for images of objects with perspective.
One solution mentioned by these researchers, which they say was previously used in a NASA project, is based on creating what I will call an ” overlap matrix.” The researchers point out that a plane that divides a scene into two parts can be used to resolve “any priority conflict” between objects on different sides of the plane. In General, you may need to add these planes to the scene explicitly, but with a certain geometry structure, you can rely on existing faces of objects. The researchers demonstrate the example shown below, where p1, p2, and p3 are separation surfaces. If the camera’s point of view is on the front or” true ” side of one of these planes, then pi is 1. The matrix shows the relationship between the three objects depending on the three dividing planes and the location of the camera’s point of view — if the ai object overlaps the AJ object, then the aij element of the matrix will be equal to 1.
The researchers proposed to implement this matrix in hardware and re-calculate it in each frame. In essence, the matrix should act as a large switch or a kind of built-in z-buffer. When drawing the current object, the video is not output for parts of the object when the object column is 1, but the corresponding row object is drawn.
A serious drawback of this approach is that to describe a scene with n objects, you need a matrix of size n2. Therefore, the researchers decided to test whether it is possible to present the overlap matrix as a “priority list”, which will have the size of all n and set the order in which the objects should be drawn. They immediately noticed that in certain scenes, such as the one shown above, ordering is not possible (because there is an overlap cycle), so they spent a lot of time mathematically determining the” right “and” wrong ” scenes. In the end, they concluded that at least for the “right” scenes (and it’s easy enough for a scene designer to avoid the “wrong” cases), a priority list can be generated. But they left the list generation as an exercise for the reader. It seems that the main contribution of this 1969 work is to point out that at least theoretically it should be possible to use dividing planes to arrange objects in a scene.
It was only in a 1980 paper called “On Visible Surface Generation by a Priori Tree Structures” that a specific algorithm for this was demonstrated. This article, written by Henry Fuchs, Zvi Kedem, and Bruce Taylor, was the first to describe the BSP tree. The authors say that their new data structure is “a solution that is an alternative to the approach first used ten years ago, but because of some difficulties is not so widespread” – so they respond to the solution chosen in the work performed for the US air force in 1969. After building a BSP tree, you can easily use it to prioritize objects in the scene.
Fuchs, Kedem, and Naylor provided a fairly clear description of the BSP tree, but I will try to give a less formal, but brief, description.
We start by selecting one polygon in the scene, and make the plane in which the polygon lies the dividing plane. This one polygon also becomes the root node of the tree. The remaining polygons of the scene will be on one or the other side of the root separation plane. Polygons on the “front” side or “front” the half-plane are in the left subtree of the root node, and the polygons on the “back” side or the “back” half-plane are found to be in the right subtree. We then recursively repeat this process by selecting polygons from the left and right subtrees as new dividing surfaces for their own half-spaces, which generate further half-spaces and subtrees. The process ends when the polygons end.
Let’s say we want to render the geometry of the scene from back to front. (This is called the “artist algorithm” because it means that polygons farther away from the camera will be painted over with polygons closer to the camera, creating the correct rendering.) To implement this, we just need to go around our BSP tree in order; the decision on whether the left or right subtree should be drawn is made based on where the camera’s point of view is located — in the front or back half-space relative to the dividing plane associated with this node. That is, in each node of the tree, we first render all the polygons on the “far” side of the plane, then the polygon on the dividing plane, and then the polygons on the “near” side of the plane. The “near” and ” far ” polygons are defined relative to the camera’s point of view. This solves the VSD problem because, as we learned a few paragraphs ago, polygons on the far side of the dividing plane can’t overlap anything on the front side.
The diagram below demonstrates building and traversing a BSP tree that describes a simple 2D scene. In 2D, dividing lines are used instead of dividing planes, but the basic idea remains the same.
A very convenient feature of the BSP tree, which Fuchs, Kedem, and Naylor emphasize several times, is that it only needs to be built once. This seems surprising, but a single BSP tree can be used to render a scene regardless of the camera’s point of view. The BSP tree remains applicable as long as the scene polygons do not move. This is why the BSP tree is so useful for real-time rendering — all the complex work of building a tree can be done in advance, rather than at the time of rendering.
Fuchs, Kedem, and Naylor report that additional research is required to create a” good ” BSP tree. The quality of the BSP tree depends on which polygons you choose to define the separation planes. I missed this point earlier, but if the split uses a plane that intersects other polygons, then in order for the BSP algorithm to work, you need to divide the intersected polygons into two, so that one half belongs to one half — space and the other to the other. If this happens frequently, building a BSP tree significantly increases the number of polygons in the scene.
Bruce Taylor, one of the authors of the 1980 paper, later wrote about this problem in his 1993 article “Constructing Good Partitioning Trees”. According to Carmack’s colleague and id Software co-founder John Romero, this article was one of the papers Carmack read when trying to implement BSP trees in Doom.
BSP-trees in Doom
Recall that in the first draft of the Doom renderer, Carmack tried to set the rendering order for the level geometry by pouring the renderer out of the room where the player is located, into neighboring rooms. BSP trees were a more convenient way to determine this order, because they avoided the problem of the renderer repeatedly visiting a single room (or sector), wasting CPU cycles.
“Adding BSP trees to Doom” in practice meant adding a BSP tree generator to the Doom level editor. After the Doom level was created, a BSP tree was generated from the level geometry. According to Fabien Saglar, the generation process could take up to eight seconds for one level and 11 minutes for all levels of the first Doom. The generation process was so long in part because Carmack’s BSP generation algorithm tries to find a “good” BSP tree using various heuristics. An eight-second delay would have been unforgivable during the game, but pre-generation seemed acceptable, given the performance improvements that BSP trees provided for the renderer. The generated BSP tree of a particular level was saved as part of the level data that was loaded into the game when it was started.
Carmack made an important change to the BSP tree algorithm described in a 1980 article: after running Doom and reading the current level BSP tree into memory, the renderer uses this tree to draw objects from front to back, rather than from back to front. In a 1980 paper, Fuchs, Kedem, and Naylor demonstrated how a BSP tree can be used to implement an artist algorithm with back-to-front rendering, but the artist algorithm has a large amount of redrawing that could be costly on IBM-compatible PCs. Therefore, the Doom renderer starts with a geometry closer to the player, and then draws the far one. This reverse order is easy to implement using the BSP tree, because you can simply make a decision in each node of the tree reverse traversal. To prevent the more distant geometry from being drawn on top of the closer one, the Doom renderer uses a kind of implicit z-buffer, which provides many of the advantages of a normal z-buffer while taking up much less memory. There is one array that tracks overlap in the horizontal dimension, and two other arrays that track overlap in the vertical direction from the top and bottom of the screen. The doom renderer could do without using a true z-buffer, because Doom, strictly speaking, was not a fully three-dimensional game. The less expensive data structures worked in it because certain elements were not possible in Doom: the horizontal overlap array worked because there were no sloping walls, and the vertical overlap arrays worked because there were no walls in which, for example, there are two Windows located on top of each other.
Doom II, as complex as its predecessor.
But no one complained about the repetition of blood.
A new word in Quake technology
The only tricky problem left is how to embed the moving doom characters into the static geometry of the levels drawn using the BSP tree. Enemies in Doom could not be part of the BSP tree because they moved; the BSP tree only works with fixed geometry. Therefore, the Doom renderer first draws the static geometry of the level, tracking (with another memory-efficient data structure) the screen segments that were rendered. Then it draws enemies from behind to front, truncating them by overlapping segments of the screen. This process is not as optimal as rendering with a BSP tree, but since there are usually fewer enemies visible than geometry, speed is not a problem here.
The use of BSP trees in Doom was a major victory. Obviously, Carmack was smart enough to realize that the ideal solution would be BSP trees. But was it a genius decision?
In his excellent book on the Doom engine, Fabien Sanglar quotes John Romero as saying that Bruce Naylor’s article “Constructing Good Partitioning Trees” was mainly about using BSP trees to cut off the back faces of 3D models. According to Romero, Carmack thought the algorithm might still be useful for Doom, so he implemented it. This is highly commendable for Carmack, because it implies that he saw the usefulness of BSP trees in real-time video games back when other people were still using this technique to render static scenes. An equally flattering story is in Masters of Doom: Kushner suggested that Carmack had read Naylor’s article and wondered: “what if you could use a BSP tree to create not just one 3D image, but an entire virtual world?»
These findings ignore the history of the BSP tree. When us air force researchers first realized that splitting a scene could help speed up rendering, they were interested in speeding up real-time rendering, because they eventually tried to create a flight simulator. The flight simulator example is mentioned again in a 1980 article. Fuchs, Kedem, and Naylor write that the BSP tree can be useful in a flight simulator that pilots use to repeatedly practice landings at the same airport. Since the airport geometry never changes, the BSP tree can only be generated once. Obviously, they were thinking about real-time simulation. In the introduction to the article, they even explain their research by testing the possibility of using a real-time graphics system to create images in no more than 1/30 of a second.
In other words, Carmack was not the first to think about using BSP trees in a real-time graphical simulation. Of course, anticipating that BSP trees can be used in this way and implementing it are completely different things. But even when implemented, Carmack may have had more background information than is commonly believed. The Wikipedia article on BSP trees suggests that Carmack was referring to a 1991 article by Chen and Gordon, as well as a 1990 textbook Computer Graphics: Principles and Practice. Although this statement is not supported by a quote, it may be true. A 1991 article by Chen and Gordon describes front-to-back rendering using BSP trees, which is essentially the same solution that was used in Doom, down to the “implicit z-buffer” data structure, which does not allow drawing distant polygons on top of near ones. This article provides an excellent overview of BSP trees, as well as pseudocode for building and displaying a tree. (Thanks to the wonderful library of my University, I was able to scroll through the 1990 edition.) The textbook Computer Graphics: Principles and Practice is a classic work on computer graphics, so Carmack could have it too.
Divided into subsectors level E1M1: Hangar.
Anyway, Carmack faced a new challenge — ” How can you create a first-person shooter running on a computer with a processor that can’t even perform floating-point operations?”- conducted his own research, and proved that BSP trees are a useful data structure for real-time video games. I still think this is an impressive result, even though the BSP tree was invented ten years earlier, and had been studied in sufficient detail by the time Carmack read about it. Perhaps the main achievement that we should praise is the Doom engine as a whole, which has become a great example of code. I have already mentioned this once, but I repeat that Fabien Sanglard’s book about the game engine doom (Game Engine Black Book: DOOM) is an excellent overview of all the thoughtful components of the game engine and their interaction. We must not forget that the VSD task was only one of the many tasks that Carmack had to solve in order for the Doom engine to function. And that he was able, on top of everything else, to read about a complex data structure unknown to most programmers and implement it. This says a lot about the level of his technical knowledge and striving for the ideal.