Optimizing Performance for DOSDev Retro GamesRetro game development for DOS (commonly referred to as DOSDev) is both a nostalgic hobby and a technical challenge. Working within the constraints of 16-bit x86 architecture, limited memory, slow disk I/O, and modest graphics and sound hardware forces you to be economical and deliberate with resources. This article walks through pragmatic techniques for squeezing maximum performance from DOS-era games while maintaining portability and readability in your code.
Why performance matters in DOSDev
DOS-era hardware has tight limits: CPU cycles, RAM (often 640 KB base), and video memory are scarce. Modern players testing your game on emulators expect smooth framerates and responsive controls. Efficient code minimizes input lag, prevents graphical glitches, and preserves the authentic feel of classic games.
Target platforms and constraints
- Real DOS on vintage machines (8086/286/386/486, early Pentiums)
- Emulators (DOSBox, PCem) — useful for testing but not a replacement for hardware-aware optimizations
- Memory models: tiny, small, medium, large — choose the right one for function/data layout to reduce pointer overhead
- Video modes (VGA 320×200 256-color, EGA, CGA), sound (PC speaker, Sound Blaster), and storage (floppy disks, early HDDs)
Development environment and toolchain
- Compilers: Borland Turbo C/C++, Microsoft Visual C++ 6.0 (older), DJGPP (32-bit protected mode), Watcom C/C++
- Assemblers: NASM, MASM, TASM
- Linkers and libraries: use minimal CRTs or custom startup code to avoid bloat
- Emulators for testing: DOSBox for convenience, PCem for more accurate hardware timing
Tip: For high performance and precise hardware control, combine C with inline or separate assembly routines.
Algorithmic optimization
- Algorithm choice
- Prefer O(n) over O(n^2) where possible. For example, for collision checks, use spatial partitioning (grids, quadtrees) rather than naive pairwise comparisons.
- Fixed-point arithmetic
- Use fixed-point instead of floating point on CPUs without FPU (80286/80386SX). Example: 16.16 fixed point gives a good balance of range and precision.
- Precompute lookup tables
- Trigonometric functions, inverse square roots, and palette or lighting ramps are perfect for precalculation to replace heavy math at runtime.
- Memory vs CPU tradeoffs
- Cache results in RAM if it saves many CPU cycles; memory is often cheaper than CPU cycles on older hardware.
Low-level optimizations
- Use registers and avoid unnecessary memory loads/stores.
- Minimize function call overhead — prefer inline functions or macros where appropriate.
- Optimize inner loops in assembly (loop unrolling, branch prediction avoidance).
- Align data structures for faster access (word/dword alignment for ⁄32-bit accesses).
- Use segment registers effectively with small/medium memory models to minimize far pointer usage.
Example: replacing repeated far pointer dereferences by copying a segment:offset into a local pointer in the right segment to reduce overhead.
Video rendering techniques
- Choose the right video mode
- VGA 320x200x256 (Mode 13h) is popular: single-plane linear framebuffer simplifies blitting.
- Efficient blitting
- Bulk copy with REP MOVSB/MOVSW in assembly for large transfers.
- For sprites, use masked blits: precompute masks or use RLE (run-length encoding) for sparse sprites.
- Double buffering
- Draw to an offscreen buffer and swap to avoid tearing. On Mode 13h, copy the entire buffer to video memory with optimized memory copies.
- Page flipping (when available)
- Use VGA banks and page flipping where supported to swap pointers instead of copying.
- Palette tricks
- For effects like fading or palette animation, adjust the VGA palette instead of per-pixel operations — dramatically cheaper.
- Dirty rectangles
- Only redraw screen regions that changed. For HUDs and static backgrounds, composite intelligently.
- Tile-based rendering
- For tile maps, pre-render tiles to an offscreen buffer and composite only changed tiles.
- Use hardware scrolling when possible
- On some hardware, VGA can do smooth hardware panning; use it to avoid re-rendering the whole scene.
Sprite and animation optimizations
- Store sprites in formats that match the blit routine (aligned, packed).
- Pre-rotate or pre-scale sprites where possible—runtime transforms are expensive.
- Use sprite atlases to reduce texture/sprite switching and simplify batching.
- For particle systems, use simple physics and pooled objects to avoid frequent allocations.
Sound and music
- Use callback or timer-driven mixing at a sensible sample rate (e.g., 11 kHz or 22 kHz depending on CPU budget).
- Prefer hardware DSP (Sound Blaster) for music and effects where available.
- Keep audio buffers large enough to avoid underruns but small enough to limit latency.
- For MIDI, offload playback to the MIDI device rather than software synth when possible.
Input handling and responsiveness
- Poll hardware directly (keyboard, mouse, joystick) for minimal latency.
- Debounce and process inputs in a separate step from rendering.
- Avoid long blocking operations in the main loop that delay input handling (disk reads, lengthy computations).
Disk and file I/O
- Reduce disk seeks by packing assets and using contiguous files.
- Load large data in background or during level transitions; show a progress indicator rather than stalling.
- Cache frequently used assets in memory (fit within your memory model).
- Use compressed assets with fast decompression (RLE, LZ variants) to trade CPU for I/O bandwidth.
Memory management
- Prefer static allocations for frequently used objects to avoid fragmentation.
- Implement simple object pools for transient entities (bullets, particles).
- Carefully choose the memory model: tiny/small for flat code/data pointers, medium/large when you must access far data but accept overhead.
- Use EMS/XMS/DPMI/DJGPP when you must exceed conventional memory, but be mindful of performance and portability tradeoffs.
Profiling and benchmarking
- Use cycle counters (RDTSC on later CPUs or timer chips on older hardware) to measure hotspots.
- Time sections with BIOS or PIT (Programmable Interval Timer) reads for consistent profiling across machines.
- Profile on representative target hardware, not only on fast development machines or emulators.
Quick checklist:
- Measure before optimizing.
- Optimize hot code paths first (rendering, physics, input).
- Keep changes small and test frequently.
Portability vs performance
- Cleaner, portable C code is easier to maintain; assembly yields highest speed.
- Isolate platform-specific assembly in well-defined modules (rendering, sound, input).
- Use conditional builds so you can compile a high-performance assembly path for DOS and a portable path for other platforms or emulators.
Example patterns and snippets
-
Fixed-point position updates:
// 16.16 fixed point example typedef int32_t fix; #define FIX_SHIFT 16 #define FIX_FROM_INT(i) ((i) << FIX_SHIFT) #define FIX_MUL(a,b) ((fix)(((int64_t)(a)*(b)) >> FIX_SHIFT)) #define FIX_TO_INT(a) ((a) >> FIX_SHIFT)
-
Simple sprite blit (Mode 13h conceptual pseudocode):
// draw sprite at (x,y) into offscreen buffer for (row = 0; row < h; ++row) { memcpy(&offscreen[(y+row)*320 + x], &sprite[row*w], w); }
In assembly, replace memcpy with REP MOVSB/MOVSW for speed.
Common pitfalls
- Overusing far pointers and excessive segment/stack switching — expensive.
- Blindly copying modern idioms (heavy use of floating point, dynamic memory) to DOS targets.
- Testing only in fast emulators — timing and behavior can differ on real hardware.
- Premature optimization: always profile first.
Final notes
Optimizing DOSDev games is a balancing act between authenticity, maintainability, and squeezing out hardware-limited performance. By combining sound algorithmic choices, careful memory and resource management, and targeted low-level optimizations (especially for rendering and input), you can create retro games that perform smoothly on both emulators and vintage machines. Preserve readable code where possible, isolate platform-specific optimizations, and measure impact before committing changes.
—
Leave a Reply