Performance and Framerate
At this point you’ve got a giant map chock full of crazy designs and clever enemy setups, the whole thing peaks at 12fps, and you don’t know why. Now what?
In Quake 3, keeping map performance up was pretty simple: if r_speeds was too high, cut back on the detail or find out what’s wrong with vis, maybe a clever hint brush here or there, and that’s it. The Doom engine is much more complicated, and with the myriad of new toys it gives developers comes a dazzling new array of ways to make the game run slow if you’re not careful. The trick comes in identifying which of these new tools is causing the problem. This page provides an overview of how framerate is impacted by lighting, shadows, draw calls, portalling, game frame time, and memory limits.
New Console Commands
For Quake4 we added a number of new debugging tools to help the wayward designer identify specific issues in low framerate areas.
com_limits – set to 1 to enable This is a simple catch-all we put in as a way of quickly picking out trouble spots. It’s hard-coded with the limits we used on Quake4 for triangle count, sound memory, texture memory, and active AI. With com_limits on, as you run through the game a box will appear on the screen whenever a certain limit is exceeded, showing you how far over the limit you’ve gone in the spot you’re standing.
- The triangle count warning will go off if world architecture trianlge count is greater than 100,000 or the scene total is greater than 150,000. On the ARB2 renderer path, the limit are reduced to 71,000 and 107,000 respectively. The ARB2 path does the same amount of work in less passes, so 107,000 triangles in the ARB2 path is equivalent to 150,000 triangles in the other paths. This limit being exceeded can mean anything from too much geometry to too many lights shining on that geometry to shadow patterns that are too complex, all which are explained in more detail below.
- The sound memory warning appears when the entities in the current PVS require more than 32mb of sounds in memory at once. This does not take compression into account, so custom sounds that haven’t been converted to ogg vorbis may trigger this limit.
- The texture memory warning is based on com_machinespec. It goes off when the current rendered scene requires more than 45mb of image data on machinespec 0 (a 64meg card), 80mb on machinespec 1 (for 128meg cards) and 200mb on machinespec 2 (for 256meg cards). This means you’ll be close to exceeding the video card’s onboard memory if your quality level is appropriately set, which can cause a sudden framerate plummet if the card has to begin swapping.
- The AI limit warning will go off at 5 (anywhere, not just in view), but for now the way it counts active AI is broken.
g_showdebughud – values range from 1 to 11 This will replace the HUD with specific readouts about sound, networking, physics, and so on, depending on the value you set. (Note: You will need to download the DebugHud available on this site, since the debug guis weren’t included with the shipped game.) The value we’re concerned with is 5.
On the left is a readout of timing information in milliseconds.
Game Frame displays how much time the game world spends thinking. This value is impacted by non-graphical things like active AI, physics evaluation, and collision traces.
Frontend time reflects how long the renderer spends setting up the render by deciding what to draw, generating light interactions, and building shadows.
Backend time is spent submitting draw data to the driver, and waiting for the driver to finish.
Waiting is time left over if the frame finishes in less than 1/60th of a second.
These three meters are indispensable for determining what is causing low framerates, to help you avoid having to cut detail or lights if your problem is caused by something like the physics engine.
On the right is a readout of render data being sent to the card.
Views shows the number of world views being rendered. This will usually be 2 (one for the player’s eyes and one for the HUD), but portalskies and mirrors add more.
Draws is one of the new important numbers to watch. The engine sends triangles to the video card in batches, and depending on how you construct your scenes and set up your lighting, a scene can be sent as a few hundred calls, which is fast, or as many as two thousand, which is not. There’s an entire section about keeping this one number in check below, but for now it should be kept under 1,000.
Verts is the number of vertices in the scene.
Tris shows total scene polycount, the number that used to be in r_speeds. We went by the hard-coded numbers in com_limits.
Shadow Tris shows how much of the total scene geometry is created just for cast shadows. This number can vary quite a lot before impacting performance, depending on a number of other factors, but making sure no more than one quarter of your tris go to shadows is a good baseline.
Textures is current texture memory use.
Lighting is the major statistic to keep an eye on, which many are already familiar with from Doom3 mapping. For the uninitiated, the Doom3 engine makes several separate render passes for each light source. This means geometry is redrawn for every light volume touching it. Therefore, an ideal way to maximize performance is to get the brightest light on a surface with as few overlapping volumes as possible.
To see how many passes are being made and where, set r_showlightcount to 1.
Colored regions indicate the number of passes. Red is 1, green is 2, blue is 3, cyan is 4, magenta is 5, and white is 6+. If you’re seeing a lot of white, you need to relight that area.
(You can, technically, use r_lightcount in Radiant by typing it into the editor console. Don’t expect it to be accurate, because it’s only lighting against brushes and not compiled/portalled geometry, but it can give you a very rough sense of how you’re doing if you don’t want to wait for a compile.)
There are a couple of ways to stay as close to red and green as possible. The first, and the one that’ll be the most useful to you, is to simply be reserved with your lighting. Always be aware of which of your light volumes overlap as you’re lighting a room. Don’t be afraid to light until the room looks good and is as bright as you want it, but don’t go overboard, and check r_lightcount often.
As you move around your map with lightcount on, you’ll notice that sometimes the borders between colors will match your geometry, and sometimes you’ll get straight horizontal and vertical lines that slide around as you move. This is caused by the way light volumes are scissored on screen.
A light will never create overdraw on a face it doesn’t intersect, but if a light volume touches a face that extends much farther than the volume itself, the overdraw is limited by simply cropping it to a box the size of the volume’s screen bounds (the same way visportal scissoring works – for more info on this see the related page in the doom3 section of iddevnet). A good way to see light volumes in game is to set r_showLights to 2 or 3, which will draw a translucent box around all light volumes. It can sometimes make it difficult to see much of use through the “fog” of light boxes, but moving around for a better angle usually helps. Shadowcasting lights show up as blue, and non-shadowcasting lights show up as red.
Quickly load up one of the MP maps and find an area with a row of small lights. With lightcount enabled, you should be able to look at the row of lights from an extreme angle and make their screen bounds overlap, thus producing a block of white even though the volumes don’t overlap in space, and the geometry isn’t touched by more than one of those little lights at a time. This happens because the scissoring is done around the bounds of the volume, not the affected geometry. This will happen as well with larger lights, and can be easily reproduced by standing far away from a light volume that extends very far into the floor. The light’s scissor bounds will extend all the way down to where the bottom of the light volume would be, producing a block of overdraw on the entire surface of the floor coming towards you that isn’t actually being lit. You can view these scissor outlines in game by setting r_showLightScissors to 1.
To minimize overdraw in these cases, you can split your brushes along the edges of light volumes, to ensure that the faces those volumes touch are not much bigger than the light volume itself. To keep the compiler from recombining these faces you can either separate them with architectural detail like thin strips of trim, shift the textures on alternating brushes by some unnoticeable amount (like 0.125 units), or force a split with visportals (not recommended.) These will all of course add to your tri count, as well as the number of draws you’re pushing, so it’s up to you to look on a scene by scene basis to see which light volumes you can trim brushwork around and which ones you’re better off leaving alone.
The reverse is likewise true. When placing lights to begin with it’s a good idea to stick to a larger grid size and make the edges of the light volume match your brushwork wherever possible.
Another way to keep lightcounts low is to only allow yourself an extra pass if you’re going to get a significant amount of illumination on the affected surfaces. Almost all the light shaders in Doom3 and Q4 have a falloff produced by the images they use, meaning they’re at their brightest at the center of the volume and drop off to zero at or before the edge of the volume.
If you have a large light volume that’s only pushing 8 or 16 units deep into a wall, you’re probably getting little to no visible light on that surface. In these cases you would be better off shrinking the volume just enough that it’s tangent to the wall without clipping through (so that the pink line is just visible z-fighting with the surface).
Pick the whitest light shaders you can visually get away with. If you need a full volume of light, like for sunlight coming through a large ceiling opening, rav_square_bevel will give you a big bright volume that only darkens near the very edges. Falloff works vertically as well, so if your floors and ceilings seem unusually dark try stretching the light volumes vertically to make them taller, or switch to a shader like rav_spot_long or rav_spot_nofall, which are intentionally “thick” on the vertical axis.
Quake4 has a lot of the aforementioned small light volumes, which we nicknamed chiclets. They’re a great way to add some color and highlights to a scene, they’re evocative of Quake2, and they don’t produce a lot of overdraw. They do, however, add to the list of light volumes the game engine has to run through to calculate interactions, so lots of them do eventually take their toll regardless of size. Higher end systems have no trouble here, but systems closer to the minimum spec will start to choke. Since these lights are more for effect than illumination, we added the detailLevel keyvalue as a way of instructing the renderer to skip less “important” lights.
A light’s detailLevel ranges from 0 to 10, and works with an accompanying cvar: r_lightDetailLevel. The engine will render all lights with detailLevel keyvalues set greater than r_lightDetailLevel. detailLevel defaults to 10, and r_lightDetailLevel defaults to 0 on all video architecture except NV20, where it defaults to 9. To help performance on the NV20, all of our maps have the non-crucial chiclet lights set to detailLevel 5 (which we picked arbitrarily because it’s between 0 and 9). If a light has an attached model, that model will still draw with the same color value as the light regardless of detailLevel, so even if the small pool of light is lost you’ll still at least get the glowing fixture/flare for color.
In theory this allows you to rank all your lights on a scale of 0 to 10, allowing the user to set the cvar according to taste/system performance, but we never utilized it to this degree. Instead, once you’ve finished your map, just run through and set detailLevel 5 on any lights the player can still see well without. This is especially crucial in multiplayer maps, where performance is key – you don’t want to give players with higher end systems an advantage by providing them with more illumination.
One last console command that may prove useful is r_singleLight. This will limit the render to only one light in the map, specified by the value set for this cvar. Unfortunately there's no easy way to find out what light has what number, so the only way to cycle to the light you want is to try one number at a time. (Some MP aficionados are under the impression that setting this cvar to a certain secret number depending on the map will enable vertex lighting in multiplayer, which is not the case. All they've done is figured out what number corresponds to the ambient pass in each map that has one.)
Part of the game frontend’s responsibility in setting up a render is computation of shadow volumes. When geometry casts a shadow, that shadow is handled as new triangles added to the scene.
When assessing a performance trouble spot, after checking lightcount turn shadows off at the console by setting r_shadows to 0. You’ll see framerate increase no matter what, but if you’re seeing an unusually significant gain in performance it’s a fair bet the shadows in that scene are contributing to the slowdown.
To give you a better idea of what shadows are being cast where, enable r_showShadows.
There are a few simple ways to reduce shadowing in a scene. The first is to find all the lights in the scene that you can get away with making non-shadowcasting, and disable shadows on them. Chiclets are a very good place to start. You’ll want to make sure characters still have at least one shadow wherever they can go, but if you’ve got several fill lights in one area, taking shadows off one or two won’t be visually apparent. If you can’t easily reduce shadows by light, you can try doing it by object instead. If your scene has several func_static models in it, set noshadows wherever they won’t be missed.
This will help slice off a lot of shadow computation and extra triangles right off the bat. Another thing to watch out for in your scenery’s shadows is shadow complexity. The nickname for this at Raven was "the jailbar effect." If you have a setup where a lot of fine detail, like railings or ladders, is casting shadows across a long distance or onto a lot of complex geometry, even if you’re not adding a lot of tris to the scene the frontend has to do a lot more math intersecting these shadow volumes with geometry (characters included).
This also applies to shadows being cast at oblique angles. It’s a rare occurrence, but if you have a large light with the origin dragged far out to one side, any character in that volume will cast a shadow way down the long axis of the light. When it comes to shadows, short and simple is best.
Draws & Batch Size
Every new generation of video cards that hits the market is able to render more and more triangles per second than the generation before. The way cards are able to gain so much speed is by rendering them in parallel, taking batches of polygons and running through multiple batches at a time. Each batch, or draw, has a slight penalty in overhead, meaning that the same number of polygons can be rendered much faster in fewer, larger batches than it can in many small ones. It is similar to the difference between city and highway mileage -– you’ll get much more out of your fuel by going long distances in fifth gear than you will by driving stop and go.
With the number of polygons Quake4 puts to the screen, each scene would ideally be rendered in at most 300 batches of at least 500 triangles each. Without proper precautions on the part of the designer, however, the Doom3 engine will stray very quickly towards many small batches.
The engine will split the polygons it sends to the video driver per texture, per light, per entity, and per portal area. That means that for every light volume in the scene, a batch is sent for every group of polygons sharing a texture affected by that light volume. If the same texture appears on brushes in the world and on a func_static, even within one light volume the func_static will go to the renderer in separate batches. If a func_static with the same model keyvalue is repeated sixteen times down a hallway, each one will batch separately from the other fifteen. If you have a long, highly subdivided patch mesh with four or five chiclet lights spaced out along the curve, the curve will be split into small batches for each light.
Furthermore, all effects batch separately from each other, even those with the same .fx, and every stage in the effect goes as its own batch. GUIs also batch separately from each other and, obeying the same laws that apply to textures in the world, every windowDef in a GUI with its own image on it goes as its own batch of two polygons.
As you can see, draws add up fast.
The ideal 300 batches of 500 quickly becomes wishful thinking, and if you tool around in Quake4 with debugHud 5 you’ll notice we usually didn’t even come close. Quake style level design usually means mid-sized rooms with interesting shapes and designs, built from a varying group of textures, lit by many small- and mid-sized lights. What video cards want from Quake4 is spaces with only a few textures and a couple of big giant light volumes covering everything.
They can, however, handle much worse, and depending on video architecture you only need to really worry if you’re pushing into the quadruple digits. Thus, if your draws hover around 1000 once the shooting starts, we’d say you were still doing well.
Spotting areas with too many draws is simple enough, but to identify where in that scene all the draws are coming from we added r_showBatchSize. It works just like r_showtris, with the same effects for values of 1, 2, and 3, but the outlines will be colored based on the size of the smallest batch they’re in. It scales from pink (batch size less than ten, meaning bad) through red, orange, yellow, and stops at green (batch size greater than 500, meaning good).
This can be a hard display to read, the main problem being that lots of pink isn’t always bad and green isn’t always good. You can have a scene with a lot of pink and red batching, but if your draws are only in the 400 range you won’t really suffer for it. On the other hand, if your scene is full of decals they’ll batch individually (one for each blood splat, say), but if you crank up the subdivisions on each decal they’ll jump up to a few hundred triangles each and voila, they’re green, which doesn’t actually solve the problem.
Another way to find small batches is to set r_limitBatchSize. This will instruct the renderer to only draw batches of a size greater than whatever value you set this to. Set it to 100 and see how much of your map disappears.
There’s no simple answer to a scene that batches poorly, but once you become more familiar with how it works you’ll learn what steps to take to optimize a scene for it.
This is a list of many of the common draw-reducing solutions we used on Quake4. Often, getting your draw count down will mean making some sacrifices and cutting certain things back, but with enough caution you can sometimes bring your numbers down without sacrificing the visuals you’ve created.
Use fewer, larger light volumes that encompass more geometry when possible. Lighting a room with fewer lights will gather the various textures in the scene into fewer groups. If the placement of various pools of light is important to the look of the scene, try creating a custom light shader in a paint program that you can use to match how the room would look with multiple lights.
Find similar looking textures and pick one texture to replace them all with. There are a lot of groups of textures in Quake4 that are essentially siblings, just with different patterns of dirt/rust/little fiddly details on them, and while using a bunch of them makes for a lot of cool detail, it makes for a lot of draws as well. At the end of the day, as the player is running past those walls with his guns ablaze, he probably won’t notice if all of those wall panels are the same. (This can be a painless change, but depending on how severe your batching problems are and how wonky the texturing is, you may have to gut and rebuild some geometry. Painful, but a fact of life.)
Portals split batches. If you’ve got a room that’s cut up with portals all over the place, you’re probably doing more harm in splitting your draws up than you are good by culling some of them from the right angles. Portal your doorways and try not to go crazy with them beyond that.
If you have a lot of identical func_statics, force them into one entity. If you’ve got light fixtures with color keys on them, detach them from their lights where possible and make the whole group of fixtures into one func_static.
If you have a func_static mapmodel that you’re using many of in one view (like pillars or crates), set their inline keys to 1. This will turn them into bsp geometry and thus make them all part of the worldspawn, and they’ll batch together. Keep a close eye on what mapmodels you do this to – they’ll increase your .proc and .cm sizes, and in some cases will start to show shadow z-fighting on their surfaces. If that happens, try adjusting the lighting around them, but if it doesn’t go away you’ll just have to de-inline them and find somewhere else to save draws. Also, don’t inline any func_statics that you’ve given keys like color or noshadow, as those keys will be thrown out when they’re converted to worldspawn.
If you only have stuff grouped into an entity for the sake of editing ease, remember to use func_groups, not func_statics.
When adding func_fx entities to the map, check the batching on the effects they spew. Some of our effects batch well, and some don’t, so be smart about which ones you use where.
A very good explanation of visportals is already available here: http://www.iddevnet.com/doom3/visportals.php
There's only a short list of things we would add to this:
Space your portals out. The farther apart in space two portals are, the narrower the angles at which they overlap become. You'll get almost no gain from visportals that share edges, and in some cases doing so can even lead to errors where portals close when they aren't supposed to, which can make parts of the scene flicker black.
Don't extend portals any farther than you have to. They aren't like hint brushes, where you were safer making sure they dug deep into surrounding geometry. The actual face of the brush itself defines the entire size of the portal, so it doesn't need to extend any farther than the size of the opening it fills (and since you want portals to overlap as little as possible, smaller is always better in that regard.)
Don't cross the portals. They don't split each other like other geometry does, and intersecting portals leads to Bad Things™.
Game Frame & the CPU
All of the above covers renderer related slowdowns, but those aren't the only source of low framerates. The Doom3 renderer is very CPU-driven, such that any significant delay in the game frame time will set the renderer back as well. If debugHud5 reveals a high game frame time and a short render, it's time to pay attention to render-independent things in your scene.
Things that can delay the game frame:
Messy Physics: Physics evaluation can sometimes get stuck thinking too hard about what might seem like a simple interaction. Look around your scene and make sure you haven't got barrels or other movables getting stuck in some state that prevents them from coming to rest. Ragdolls can often cause this, especially if dropped onto another moving entity. On heavily CPU-limited systems, shooting a monster that falls onto a lift or a func_mover can kill framerates. Ragdolls on top of ragdolls can be even worse.
AI: Use ai_debugTactical to watch the movement of your monsters and NPC's during the slowdown, and see if any are having trouble pathing anywhere. Tracing through the AAS table is expensive, and doing it too often per frame will steal your precious cycles. Check that you haven't mistakenly placed a tether behind monsterclip, or that several entities aren't all trying to path around each other or through a narrow space (like a doorway).
Traces: The game performs a lot of traces per frame. It constantly traces beneath the player to keep track of what he's standing on (if anything), in front of the player to check for NPC's to flip up their crosshair ID, in front of every projectile and moving object, and even on some effects (like sparks that bounce or drips that splash on the floor). These work hand in hand with the physics listed above - sparks bouncing off of ragdolls can become costly. Any trace against an MD5 is also inherently expensive, because if a trace is determined to hit an MD5's bounding box it will then have to evaluate against every poly in the MD5 to determine a hit with more precision. In the singleplayer map "Putrefaction Center", several sources of terrible framerates were revealed to be traces through the MD5 intestines strung around the map. Many of them were made as one huge MD5, meaning their bounds often encompassed whole hallways or rooms, which in turn meant that shooting in any direction in that room led to traces against every poly in the entire mess of intestines ... not good. They were since split and given low-detail collision hulls (as were all of our creature MD5s). Keep an eye out for any excessive or unnecessary traces by enabling r_showRenderTrace.
Sound Decompression: Some of Quake4's sounds are included as .ogg, and some are left as .wav. The reason for this is that .ogg decompression uses a little CPU, and any sounds that were expected to be played often (like gunshots) were loaded decompressed for speed. Sounds not expected to be used with any frequency (like scripted VO) are loaded compressed to reduce memory footprint and decompressed on the fly. If you feel like you've wrung the towel dry in performance, check what sounds are playing in your scene, and see if you're repeatedly decompressing some of the same .oggs, and replace them with similarly appropriate .wavs instead.
The last major performance point to keep in mind may not affect framerates, but have a definite impact on load times - the size your map takes up in memory.
Hit the console and type printmeminfo. This will give you a rundown on how much space in memory various bits of your map require. These numbers were regularly compared against our own in-house limits, and in some cases designers went to a great deal of trouble to keep the maps' memory footprints in check.
Image Memory: The amount of memory taken by every image currently loaded in the game, from the numbers on the HUD to game textures to mapmodel and monster skins. This is arguably the most important limit, because it's also the easiest to completely blow away if you don't keep an eye on it. Our internal limit was no more than 200 megs. If you see yourself approaching or surpassing this, you'll have to assess what assets you're using in your map and find which ones you can do without. Replace textures you've only used a couple of times with something similar. If you've only used a certain monster once, and it's not that important, cut it or swap it for something similar. You may even have to consider splitting your map into two separate maps if your design allows for it.
Model Memory: Total size of .MD5MESHes and .LWO/.ASE objects in the map, which shouldn't exceed 45 megs. If this is too high, either cut creatures or type listmodels at the console to get a massive printout of every model in the map. Condump to a text file and see if you can get rid of any less frequently used mapmodels.
Sound Memory: Total size of sound assets in memory, which should stay below 40 megs - fairly self explanatory.
Collision Memory: The total amount of .cm data for your map and its assets, which shouldn't exceed 30 megs. If this is too high, either the map is just too big and complex or, more likely, lots of detailed func_static mapmodels are contributing their oversized clipmodels to the total. Run around the game with g_showCollisionModels or g_showCollisionWorld set to 1 (and be prepared for poor framerates). This will reveal places where you can group complex brushwork into a func_static you can then mark as "nonsolid", and then clip off with more simple brushwork textured with common/fullclip. Existing func_statics that don't have specialized collision models will create their own from the model itself at runtime, which often leads to a needlessly complex CM. Mark these as "noclipmodel" and clip them off with a few brushes as well to save memory.
AAS Memory: How large the AI navigation files are for this map, which you should keep below 10 megs. This number will be different than the total filesize of all your .aas files, because some things (like reachabilities) are generated at runtime. If this total is too high (and it can sometimes skyrocket), run around your map with aas_showAreas set to 2 or 3. Look for big regions of AAS data where monsters could never go, and monsterclip them off. Make sure existing monsterclip extends up to the ceiling, or else needless AAS will be generated on top of it. Also be wary of use of monsterclip or monsterclip_full that's too complex: if you have a pillar with lots of little slanty bits, the clipping doesn't have to exactly match the model. Just immerse the whole thing in one large brush and be done with it.
Anims Memory: Memory specifically for MD5 anims, which should be kept below 20 megs. If this is too high, check the .def files for the creatures and NPCs you're using. The reason we have separate marine and sometimes monster defs for certain levels is because if all of the anims that the base marine ever needs in the game are specified in the base marine .def, they're all loaded for every map that has a marine in it. Make sure you're not using any entities with more anims than you need. If you're utilizing one anim on an entity that also comes with a lot of others you don't need, consider including a custom .def file with your map.
Total Asset Memory: Our max total for all of the above plus extra incidental stuff is 300 megs. The other limits, you may have noticed, add up to more than 300. This is so they can be treated a little fluidly as long as you stick to the big one. Furthermore, if you really don't want to cut a certain cool bunch of textures that puts you over the image limit, making up the difference with savings elsewhere is a perfectly valid option (within reason, of course).
This document is by no means meant to give you the impression that the only thing you can do without killing framerates in Quake4 is a single textured room with no lights. The game does have to spend time doing something, and the purpose of all of the above is to make you familiar with the kinds of things to be wary of. With enough expertise you'll learn to address all of these issues in a scene at once, so that if you want to try something like a cool jailbar effect, you'll know what tradeoffs you can make to make it work.