Real-Time Denoising: SVGF, A-SVGF, DLSS & ReLAX

Modern real-time ray tracing typically casts only 1–4 rays per pixel per frame. In the path-tracing limit, 1 spp (sample per pixel) produces an output that is almost 100 % noise. Denoising transforms this extremely noisy signal into a temporally stable, perceptually clean image — and must do so within a 1–3 ms GPU budget. This article covers the canonical algorithms: bilateral filtering, SVGF, A-SVGF, ReLAX, and the neural upscaling family (DLSS / FSR / XeSS).

1. The Noise Problem in 1-spp Ray Tracing

A path tracer solves the rendering equation with Monte Carlo integration:

L(x, ω) = Lₑ(x,ω) + \int_Ω f_r(x,ωᵢ,ω) L(x_prev,-ωᵢ) cosθᵢ dωᵢ

With 1 sample per pixel the estimator variance is proportional to the second moment of the integrand and inversely proportional to sample count. For typical indoor scenes the per-pixel coefficient of variation can exceed 200 % at 1 spp. The denoiser must recover the true mean from this highly noisy estimate using spatial, temporal, and G-buffer data.

Available G-buffer data

Modern deferred pipelines provide per-pixel: world-space position, normal (16/32-bit), albedo, roughness/metallic, depth, motion vectors (reprojection), object ID, and material ID. All denoising algorithms exploit this high-quality geometric signal to guide filtering.

2. Bilateral & Cross-Bilateral Filters

The bilateral filter is the conceptual foundation of all RT denoisers. Unlike a Gaussian blur it preserves edges by weighting sample contributions not only by spatial distance but also by feature similarity:

I_filtered(p) = (1/W_p) \cdot Σ_q\inΩ k_s(||p-q||) \cdot k_r(||I(p)-I(q)||) \cdot I(q)

W_p is the normalisation factor. k_s is a Gaussian range kernel (spatial), k_r is a Gaussian range kernel (photometric/feature). The cross-bilateral variant (Joint Bilateral) uses G-buffer features instead of noisy colour for k_r — normal n(p)·n(q) > cos θ_thresh, depth linearised z, and albedo similarity. This avoids bleeding across material boundaries while still smoothing within surfaces.

Limitation: a fixed spatial footprint (e.g. 5×5 or 11×11 kernel) either over-blurs high-frequency detail or misses long-range correlations needed for indirect lighting. A-trous wavelet factorisation addresses this.

3. SVGF: Spatio-Temporal Variance-Guided Filtering

SVGF (Schied et al., HPG 2017) introduced the combination of temporal accumulation and variance-guided filtering into a unified single-pass denoiser for 1-spp path tracing. The algorithm runs in three stages each frame:

3.1 Temporal Accumulation

Each pixel p is reprojected to the previous frame using a motion vector. The accumulated colour and integrated squared colour (for variance) are blended with an exponential moving average:

Ĉ_t(p) = α · C_t(p) + (1 − α) · Ĉ_{t-1}(p′), && α ∈ [0.05, 0.2]

Per-pixel variance is estimated from the difference of accumulated first and second moments: σ²(p) = Ê[C²] − Ê[C]². A disocclusion test (depth + normal thresholds applied to the reprojected position) resets accumulation to α=1 for newly visible pixels, avoiding "ghosting" where a moving object reveals hidden surfaces.

3.2 Variance Estimation and Pre-filtering

The temporal variance estimate is spatially pre-smoothed with a 3×3 bilateral filter to reduce noise in the variance map itself. This gives a reliable local variance σ²(p) for the next step.

3.3 A-Trous Wavelet Filter

Five passes of a 5-tap dilated filter (hole-pattern wavelet = "à trous") with step size 2⁰, 2¹, 2², 2³, 2⁴ produce an effective 65-pixel radius while costing only 5 × 5 = 25 texture fetches per pass (cf. 65² = 4225 for brute force).

Each tap weight combines spatial Gaussian kernel, normal kernel, and luminance kernel; the luminance kernel bandwidth is inverse variance-guided: larger noise → softer kernel; smaller noise → sharper kernel:

w_lum(p,q) = exp(-|L_p - L_q| / (σ_L \cdot \sqrt(g(p)) + ε))

where g(p) is the pre-smoothed variance and σ_L is a tunable parameter (typically 4.0).

4. A-SVGF: Adaptive SVGF

A-SVGF (Schied et al., Eurographics 2018) addresses two failure modes of SVGF: temporal ghosting from aggressive temporal reuse, and over-blur from fixed filter bandwidth. It adds:

Problem in SVGF	A-SVGF solution
Ghosting when α=0.05 allows too much history	Per-pixel adaptive α based on temporal gradient magnitude; regions with high temporal change get α→1 (no history)
Fixed 5-level a-trous always runs, over-blurring sharp regions	Per-pixel filter iteration count (1–5) based on spatially-temporally estimated variance; low-noise pixels skip deep wavelet levels
Variance from 1-spp has outliers	Temporal gradient estimator: detect when current frame's noisy luminance is consistent with accumulated history (clamp if divergent)

A-SVGF produces sharper results on specular surfaces and handles fast camera movement better than vanilla SVGF, at a comparable GPU cost (the adaptive iteration count often reduces average passes from 5 to ~2–3).

5. ReLAX: Diffuse/Specular Denoiser

ReLAX (Nvidia, VulkanRT extension NRD — NVIDIA Real-time Denoising) is the production denoiser used in the NVIDIA NRD SDK and shipped with titles like Cyberpunk 2077 RT Overdrive. Its key innovations over SVGF:

5.1 Separate Diffuse and Specular Passes

Diffuse illumination and specular illumination have fundamentally different noise and coherence characteristics. ReLAX denoises them in separate pipelines with different variance models and temporal stability criteria, then re-combines through material BRDF evaluation. Specular denoising uses a virtual history — tracing the specular lobe's dominant direction back in time — rather than simple per-pixel reprojection, dramatically improving moving reflections.

5.2 Firefly Suppression

A pre-pass clips "firefly" outlier pixels (extremely bright single hits from point lights at grazing angles) using a clamp to a local neighbourhood maximum. Without this step, a single bright pixel propagating through temporal accumulation leaves a permanent dark smear.

5.3 LOBE-Aware Reprojection

For glossy specular, the reprojection uses the surface's roughness to weight between pixel-aligned (like diffuse) and virtual-position (mirror-like) reprojection. For roughness→0 this converges to mirror virtual reprojection; for roughness→1 it collapses to standard diffuse reprojection.

6. Neural Upscaling: DLSS, FSR, XeSS

While SVGF and ReLAX denoise at the native render resolution, neural upscalers simultaneously denoise and upsample from a lower internal resolution (e.g. 1080p → 4K at "Quality" mode = 1.5× scale factor). This trades render resolution for denoising quality within the same pixel budget.

Algorithm	Developer	Method	External data	Modes
DLSS 2/3	NVIDIA	CNN trained offline on 16K reference frames; recurrent temporal accumulation in feature space	Yes — proprietary NN weights; requires tensor cores	Quality (1.5×), Balanced (1.7×), Performance (2×), Ultra-Perf (3×)
DLSS 3.5 Ray Reconstruction	NVIDIA	Adds per-path neural denoising before upscaling; model understands RT noise patterns semantically	Same weights; RTX 40-only for frame gen	Same scale modes + frame generation
FSR 3 (FidelityFX)	AMD	Spatial CAS sharpening (FSR 1); temporal TAAU with FP16 (FSR 2); optical flow frame interpolation (FSR 3)	Open source; any GPU	Ultra Quality (1.3×), Quality (1.5×), Balanced (1.7×), Performance (2×)
XeSS 1.3	Intel	CNN-based like DLSS but with DP4a integer fallback path for non-Intel GPUs	Open source SDK; XMX units on Arc GPUs unlock full quality	Ultra Quality+ (1.3×) … Ultra Performance (3×)

DLSS Architecture (Simplified)

DLSS 2 uses a single convolutional neural network with temporal recurrence. Inputs: current frame (at render resolution), motion vectors, exposure, previous upscaled frame. The network learns to predict what an equivalent offline-rendered high-resolution frame would look like, including ghost-free temporal accumulation. Training uses asymmetric loss — penalising temporal inconsistency (ghosting) more heavily than sharpness loss.

L_total = λ_spatial · L_L1(I_out, I_ref) + λ_temp · L_DSSIM(I_out_t, I_out_{t-1}) + λ_perc · L_VGG(I_out)

7. Variance Estimation from 1 spp — Engineering Details

Accurate per-pixel variance is the single most critical input to SVGF-family filters. In production several complementary strategies are combined:

Strategy 1: Temporal moments (SVGF)

Maintain running mean and mean-squared of radiance. After ≥4 valid history frames, variance = E[L²] − E[L]². Prone to ghosting if history is stale.

Strategy 2: Spatial 3×3 moments (SVGF fallback)

When temporal history is unavailable (first frame, disocclusion), estimate variance from the spatial 7×7 bilateral neighbourhood. Noisier but ghost-free.

Strategy 3: Path-space demodulation

Separate illumination into albedo (demodulate by dividing) before filtering, then remodulate. This converts a multiplicative noise model (albedo × lighting noise) to an approximately additive one, making variance estimation more accurate across material boundaries.

8. Interactive Demo: Bilateral vs SVGF Comparison

The demo below renders a synthetic noisy signal (1 spp Lambertian sphere with a point light, computed in JavaScript) and lets you compare raw 1-spp output, joint-bilateral denoising, and a simplified single-pass SVGF-like filter. Use the dropdown and sigma sliders to see how denoising parameters affect edge preservation vs noise removal.

Mode: σ_spatial: 4 σ_lum: 0.15

Left panel: raw 1-spp. Right panel: denoised output. The bilateral filter preserves the sphere silhouette edge (guided by the depth G-buffer) while blurring noise within the highlight. SVGF-like mode also applies a 3-level à-trous pass for broader noise integration.

← ReSTIR GI WebGL2 Advanced →