Iterative solvers — 1D heat conduction

The problem

Steady-state heat conduction: −u″(x) = f on (0,1), with both ends held at ambient temperature (u = 0), discretized on n = 256 cells as A = tridiag(−1, 2, −1)/h², h = 1/257. The heater injects +1 into the first cell and the chiller removes 1 from the last, so the exact temperature (dashed gray in the first panel) is essentially a straight ramp from hot to cold. The eigenvalues of A are λ_k = (2 − 2cos(kπh))/h², giving condition number κ ≈ 2.7×10⁴ — the single number that separates these methods.

The four panels

Current temperature u (with the exact solution dashed), current residual r = b − Au, the computed step Δu applied at this iteration, and the convergence tracker (log-log; red marker = current step; all methods overlaid, the selected one highlighted). Axes are frozen over the whole history and shared across the preconditioner toggle, and the step selector holds a single global iteration value across methods and the toggle — switching anything keeps the view on the same step, so any two states are directly comparable. The computed-step panel shows most directly what the preconditioner changes: raw residual-shaped corrections become global, solution-shaped ones.

The methods

Gradient descent: step along the residual, x ← x + αr, with the exact line search α = rᵀr∕rᵀAr. (With a fixed α this same family is called Richardson iteration, and for this constant-diagonal A it coincides with Jacobi.) Worst-case rate O(κ) per digit.
SOR: a Gauss–Seidel sweep with over-relaxation at the optimal ω = 2/(1+sin πh) ≈ 1.976, which improves O(κ) to O(√κ).
Conjugate gradient: also O(√κ) but parameter-free and optimal over the whole Krylov space. Heat “information” spreads inward from the two ends roughly one cell per iteration — and CG terminates at exactly 128 = n/2 steps, where the residual falls off a cliff: the antisymmetric source excites only the 128 antisymmetric eigenmodes, and CG finds them all.

The neural preconditioner

The toggle applies a toy version of the Neural Preconditioning Operator (arXiv:2502.01337): a two-scale convolutional network — a local fine-grid stencil plus a coarse-grid branch, a miniature of the paper’s multigrid design, 1,154 weights running directly in the browser — trained offline on this A with the paper’s condition/residual losses to output z = M(r) ≈ A⁻¹r. With the toggle on, conjugate gradient becomes flexible PCG (Polak–Ribière β, with a fallback to z = r whenever the network fails to produce a descent direction), gradient descent steps along z instead of r, and each SOR sweep is followed by a line-searched correction along z — the smoother-plus-coarse-correction structure of classical multigrid, with the network in the coarse-correction role. The network is deliberately small: it accelerates the solvers severalfold without making the solve trivial.

The stopping rule is a residual test; the last table column measures each solver’s actual distance to the exact ramp at the moment it stopped — with κ ≈ 2.7×10⁴ those two notions of “converged” can differ more than you’d expect.

Inside the preconditioner

Two ways to see what the network learned. Left: its response to a unit impulse — for the exact inverse this is the discrete Green’s function, a tent peaked at the impulse (dashed); move the slider to probe how well the network reproduces it across the rod. Right: the Rayleigh gain vᵀM(v)∕vᵀv for each eigenmode v_k = sin(kπx), against the exact inverse spectrum 1/λ̂_k (dashed). The coarse-grid branch supplies the large low-mode gains a local stencil cannot produce, while the fine stencil handles the high frequencies — the same division of labor as classical multigrid. Note the network still undershoots the very lowest modes by orders of magnitude; that shortfall is exactly why it accelerates the solvers severalfold rather than solving the system in one application. Both plots are in h²A units, the scale the network was trained at; preconditioned CG is invariant to that overall scale.

impulse at x = 0.500

network M exact inverse

For the full theory, see the report series in this repo: Krylov methods & PCG, neural preconditioning, the eigenvalue story.