Skip to the content.

01 — Code Walkthrough

Complete construct-by-construct walkthrough of every piece of code in the repo: the Mathematica reference program, the Python port (with an explicit accounting of every place the port deviates and why the results are statistically equivalent but not bit-identical), and the two auxiliary Mathematica scripts. The mathematics behind the code is covered in the sibling reports — 02-eigenvalues.md (spectra), 03-gaussian-random-fields.md (the GRF right-hand side), 04-krylov-and-pcg.md (CG/PCG/FCG theory), 05-classical-preconditioners.md, 06-neural-preconditioner.md, 07-nystrom-preconditioning.md, and 08-results.md (all numbers in context). This report is about the code itself.

0. Repo map

mathematica/
  poisson_pcg.wls      # the reference program: problem + GRF + functional PCG (this report, §1)
  eigen_check.wls      # analytic-spectrum verification (§4; math in report 02)
  nystrom_pcg.wls      # randomized Nystrom preconditioner in Wolfram (§5; math in report 07)
python/
  poisson.py           # operators + GRF RHS — port of poisson_pcg.wls setup (§2.1)
  pcg.py               # pcg (classical) + flexible_pcg (Notay) (§2.2)
  preconditioners.py   # identity / jacobi / ilu factories (§2.3)
  nystrom.py           # NystromPreconditioner class (§6; full treatment in report 07)
  neural/
    npo.py             # NAMG-lite network + NPOPreconditioner wrapper (§6; report 06)
    train_npo.py       # NPO training (report 06)
    eval_npo.py        # NPO vs CG evaluation (report 06)
  experiments/
    run_baseline.py    # CG vs Jacobi baseline — port of poisson_pcg.wls driver (§2.4)
    spectra.py         # eigenvalue verification — Python analogue of eigen_check.wls (report 02)
    run_nystrom.py     # Nystrom ranks 16..256 + exact preconditioned kappa (report 07)
    npo_spectrum.py    # column-linearized NPO spectrum (report 06)
    run_all.py         # consolidated benchmark -> results/results.json (report 08)
results/               # *.json summaries + npo_checkpoint.pt
figures/               # all PNGs; mma_* are Mathematica exports

1. The Mathematica reference: mathematica/poisson_pcg.wls

A single wolframscript file that (i) builds the 5-point discrete Laplacian on a $32\times32$ interior grid, (ii) samples a Gaussian-random-field right-hand side, (iii) solves with CG and Jacobi-PCG using a purely functional PCG written around NestWhile, and (iv) exports three figures (figures/mma_source_grf.png, figures/mma_solution.png, figures/mma_convergence.png).

1.1 Problem setup (lines 26–36): Band, SparseArray, KroneckerProduct

n = 32;
h = 1.0/(n + 1);

d1 = SparseArray[
  {Band[{1, 1}] -> 2.0, Band[{2, 1}] -> -1.0, Band[{1, 2}] -> -1.0},
  {n, n}];
id = IdentityMatrix[n, SparseArray];

A = (KroneckerProduct[d1, id] + KroneckerProduct[id, d1])/h^2;
\[d_1 = \begin{pmatrix} 2 & -1 & & \\ -1 & 2 & -1 & \\ & \ddots & \ddots & \ddots \\ & & -1 & 2 \end{pmatrix} \in \mathbb{R}^{n\times n}.\]

The values are written as machine reals (2.0, not 2) so the SparseArray is packed real from the start — no exact-integer arithmetic leaks into the solver.

1.2 Gaussian random field RHS (lines 40–48)

alpha = 2.0;
tau = 3.0;
freqs = N@RotateRight[Range[-n/2, n/2 - 1], n/2]*n;
spectrum = 1.0/(Outer[Plus, freqs^2, freqs^2] + tau^2)^(alpha/2.0);
SeedRandom[42];
noise = RandomVariate[NormalDistribution[], {n, n, 2}] . {1, I};
grfRaw = Re@InverseFourier[noise*spectrum, FourierParameters -> {-1, -1}];
b = Standardize@Flatten@grfRaw;
grf = ArrayReshape[b, {n, n}];

This samples a mean-zero GRF with Matérn-like spectral density $(\vert k\vert ^2+\tau^2)^{-\alpha/2}$, the RHS distribution used for Poisson benchmarks in the FNO paper (Li et al., ICLR 2021), and the training/test distribution for the NPO (06-neural-preconditioner.md). The math is in 03-gaussian-random-fields.md; here is what each construct does:

1.3 The functional PCG (lines 57–74): PCGSolve

PCGSolve[matA_, vecB_, precondFunc_, maxIter_ : 1000, tol_ : 10^-10] :=
  Module[{bNorm = Norm[vecB], pcgStep, result},
    pcgStep[{x_, r_, p_, rz_, _}] :=
      Module[{Ap = matA . p, aStep, xNew, rNew, zNew, rzNew, relRes},
        aStep = rz/(p . Ap);          (* alpha_k = (r.z)/(p.Ap) *)
        xNew = x + aStep*p;
        rNew = r - aStep*Ap;
        relRes = Norm[rNew]/bNorm;
        Sow[relRes];
        zNew = precondFunc[rNew];
        rzNew = rNew . zNew;
        {xNew, rNew, zNew + (rzNew/rz)*p, rzNew, relRes}];
    result = Reap[
      Sow[1.0];
      NestWhile[pcgStep,
        {0.*vecB, vecB, precondFunc[vecB], vecB . precondFunc[vecB], 1.0},
        (#[[5]] > tol &), 1, maxIter]];
    {result[[1, 1]], result[[2, 1]]}];

This is Saad’s Algorithm 9.1 (PCG) written with no mutable state: the entire iteration is a pure function pcgStep from a 5-tuple to a 5-tuple, driven by NestWhile. Dissection:

1.4 The solves (lines 78–88) and the Jacobi construction

{xNone, resNone} = PCGSolve[A, b, Identity];
...
invDiag = 1.0/Normal[Diagonal[A]];
{xJacobi, resJacobi} = PCGSolve[A, b, invDiag*# &];

1.5 Figures (lines 93–116)

Wolfram 15.0 has no built-in "Viridis" color function, so lines 93–97 build one as a Blend over nine sampled matplotlib-viridis anchors (RGBColor[0.267, 0.005, 0.329]RGBColor[0.993, 0.906, 0.144]) — purely cosmetic, to make mma_source_grf visually comparable with the matplotlib default in grf_field. Exports: ArrayPlot of the GRF and of the reshaped Jacobi solution, and a ListLogPlot of both residual histories (mma_convergence), each at ImageResolution -> 150 to match the Python dpi=150.


2. The Python port

2.1 python/poisson.py — operators and GRF

laplacian_1d(n) (lines 18–40): scipy.sparse.diags([-1s, 2s, -1s], offsets=[-1,0,1], format="csr") — the exact Band construction of §1.1. Returns CSR.

poisson_2d(n) (lines 43–68): h = 1/(n+1), then

a = (sp.kron(d1, eye) + sp.kron(eye, d1)) / h**2

— literally the KroneckerProduct line. The module docstring (lines 9–11) pins down the flattening convention: matrices act on grid functions flattened row-major (C order), flat index $k = i\,n + j$ with the first axis slowest — the same as Mathematica’s Flatten — so the operator, RHS, and any reshape-for-plotting agree between the two languages with no permutation.

variable_poisson_2d(n, contrast=100.0) (lines 71–128): new in the port (no Mathematica counterpart) — a finite-volume discretization of $-\nabla!\cdot(a\nabla u)$ with the piecewise coefficient $a = 1$ for $x<\tfrac12$, $a = \text{contrast}$ for $x\ge\tfrac12$. Construction:

The purpose is stated in the docstring (lines 94–96): with contrast 100, $\operatorname{diag}(A)$ jumps from $(2\cdot 1 + 2\cdot 1)/h^2 = 4356$ in the $a{=}1$ region to $(2\cdot 100 + 2\cdot 100)/h^2 = 435{,}600$ in the $a{=}100$ region, so Jacobi is no longer a scalar multiple of the identity and becomes a real preconditioner: 137 vs 771 iterations, a 5.6× reduction (results/results.json, variable; analysis in 05-classical-preconditioners.md and 08-results.md).

grf_rhs(n, alpha=2.0, tau=3.0, seed=42) (lines 131–184) — the port of §1.2, with the three deliberate deviations documented in its Notes block (lines 147–157):

f = np.roll(np.arange(-n // 2, n // 2), n // 2) * n
spectrum = 1.0 / (f[:, None] ** 2 + f[None, :] ** 2 + tau**2) ** (alpha / 2.0)
rng = np.random.default_rng(seed)
noise = rng.standard_normal((n, n, 2)) @ np.array([1.0, 1.0j])
field = np.real(np.fft.ifft2(noise * spectrum))
flat = field.ravel()
return (flat - flat.mean()) / flat.std(ddof=1)

Line-by-line correspondence:

Mathematica Python note
RotateRight[Range[-n/2, n/2-1], n/2]*n np.roll(np.arange(-n//2, n//2), n//2) * n np.roll(·, +k) also moves the last $k$ elements to the front — identical output [0..15, -16..-1]*32
Outer[Plus, freqs^2, freqs^2] f[:,None]**2 + f[None,:]**2 broadcasting = Outer[Plus, ...]
RandomVariate[...,{n,n,2}] . {1,I} rng.standard_normal((n,n,2)) @ [1, 1j] same construction, different RNG stream
Re@InverseFourier[·, FourierParameters->{-1,-1}] np.real(np.fft.ifft2(·)) same $e^{+2\pi i k\cdot x}$ kernel; differ by the constant $n^2$ (Mathematica unnormalized, NumPy divides by $n^2$)
Standardize@Flatten@grfRaw (flat - flat.mean()) / flat.std(ddof=1) ravel() is row-major = Flatten; ddof=1 = sample std = Standardize

Why the $n^2$ and $\mathrm{Re}$ factors don’t matter: both are overall positive constants on the field, and the final standardization maps the field affinely to mean 0 / sample-std 1, so any overall scaling cancels exactly. The only non-cancelling difference is the RNG stream itself — see §3.

2.2 python/pcg.pypcg and flexible_pcg

pcg(A, b, M=None, tol=1e-10, maxiter=2000) (lines 15–81) is the imperative transliteration of PCGSolve, one Mathematica state-slot per Python local:

PCGSolve pcg (line)  
{0.*vecB, vecB, precondFunc[vecB], vecB.precondFunc[vecB], 1.0} x = zeros; r = b.copy(); z = M(r); p = z.copy(); rz = r @ z (61–65) init; the port computes M(b) once (the .wls evaluates it twice)
Ap = matA . p Ap = A @ p (68) one matvec
aStep = rz/(p . Ap) alpha = rz / (p @ Ap) (69)  
xNew = x + aStep*p; rNew = r - aStep*Ap lines 70–71  
relRes = Norm[rNew]/bNorm; Sow[relRes] res_hist.append(norm(r)/bnorm) (72–73) res_hist = [1.0] (line 59) plays Sow[1.0]; iterations = len(res_hist) - 1
zNew = precondFunc[rNew]; rzNew = rNew.zNew z = M(r); rz_new = r @ z (74–75) same “wasted” final apply as the .wls (§1.3)
zNew + (rzNew/rz)*p p = z + (rz_new/rz) * p (76) Fletcher–Reeves-form $\beta$
NestWhile[..., #[[5]] > tol &, 1, maxIter] for _ in range(maxiter): ...; if relres <= tol: break (67, 78–79) continue-while > tol ≡ break-when <= tol; both cap at maxiter and return silently on non-convergence

M=None maps to an identity lambda (line 55); b is coerced to float64 (line 57). The only semantic difference from NestWhile is that the Python loop never tests the initial state — harmless since res_hist[0] = 1.0 > tol for any meaningful tolerance. Neither implementation guards bnorm == 0.

flexible_pcg(A, b, M, ...) (lines 84–141) is identical except for one line — the search-direction update (line 134):

beta = (z_new @ (r_new - r)) / rz  # Polak-Ribiere (Notay 2000)

i.e. $\beta_k = z_{k+1}^\top (r_{k+1} - r_k) / (z_k^\top r_k)$, the Polak–Ribière form of Notay, Flexible Conjugate Gradients, SIAM J. Sci. Comput. 22(4), 2000. It subtracts the stale component $z_{k+1}^\top r_k$ that the Fletcher–Reeves ratio silently assumes is zero — an assumption that holds only for a fixed SPD $M$ and is false for the ReLU-bearing NPO. The cost is one extra stored vector (r is kept alongside r_new, lines 130–136). Consequence in the results: FCG+NPO converges in 30 iterations while plain PCG+NPO stalls at relres 9.653e-06 after 2000 iterations (results/results.json, fcg_npo / cg_npo); the mechanism is dissected in 04-krylov-and-pcg.md and 06-neural-preconditioner.md.

2.3 python/preconditioners.py

Three factories, each returning a callable $z = M(r) \approx A^{-1} r$ — the exact interface of precondFunc in the .wls:

2.4 python/experiments/run_baseline.py — the driver port

Port of the .wls driver (§1.4) plus verification the Mathematica script lacks. Flow of main() (lines 34–108):

  1. A = poisson_2d(32), b = grf_rhs(32, alpha=2.0, tau=3.0, seed=42) (lines 36–37).
  2. pcg(A, b, M=None) and pcg(A, b, M=jacobi(A)) (lines 39–40); iteration counts via len(res) - 1 (lines 42–43) — the Length[resHist] - 1 convention.
  3. Direct-solve verification (lines 50–53): spla.spsolve(A.tocsc(), b) and relative errors. From results/baseline.json: relerr_vs_spsolve_none = 5.3995e-12, relerr_vs_spsolve_jacobi = 5.3995e-12.
  4. Jacobi≡CG check (lines 57–59): max elementwise deviation between the two residual histories over their common length — 4.441e-16, i.e. two ULPs at scale 1; recorded into the JSON note.
  5. Figures (lines 66–89): baseline_convergence (semilogy of both histories), grf_field, solution_field (both via field.reshape(n, n) — valid because of the shared row-major convention).
  6. JSON summary → results/baseline.json (lines 91–107): 116/116 iterations, final relres 6.666547523655469e-11 / 6.666547523655465e-11. These records are bit-identical to the canonical.cg_none / cg_jacobi entries later produced by run_all.py in results/results.json — the whole pipeline is deterministic (fixed seeds, single-threaded sparse matvecs).

Boilerplate worth noting: matplotlib.use("Agg") before pyplot (headless safety, lines 20–22) and sys.path.insert(0, ...parents[1]) (line 27) so python/ modules import without packaging.


3. Divergence ledger: every place the port differs, and why results match statistically but not bit-for-bit

# Divergence Mathematica Python Consequence
1 RNG SeedRandom[42] → Wolfram’s default generator (ExtendedCA-family); draw order fixed by the single {n,n,2} RandomVariate np.random.default_rng(42) → PCG64 The one real divergence. The two b vectors are different samples from the same distribution (the pipeline after the raw draws is exactly equivalent, items 2–4). Hence Mathematica iteration counts need not equal Python’s 116; they agree statistically (same $\kappa(A)$, same GRF law). No cross-language bit-match is possible or claimed — results/*.json numbers all come from the Python runs.
2 FFT normalization InverseFourier[·, {-1,-1}]: prefactor 1 np.fft.ifft2: prefactor $1/n^2$ Same $e^{+2\pi i k\cdot x}$ kernel; pure constant factor $n^2$, annihilated by standardization. Zero statistical effect.
3 Std-dev estimator Standardize uses sample std ($m{-}1$) flat.std(ddof=1) Matched deliberately; NumPy’s default ddof=0 divisor is smaller by $\sqrt{1023/1024}$, so using it would rescale b up by $\sqrt{1024/1023}$ — still harmless for CG iterates (scale invariance) but would break bit-level determinism within Python between the two choices.
4 Flattening order Flatten row-major ravel() row-major (C order) Matched exactly; no permutation anywhere.
5 maxiter default 1000 2000 Immaterial for converging runs (max observed: 771). The larger cap exists so the deliberately-failing cg_npo run can demonstrate its stall at 2000 iterations (08-results.md).
6 tol type / halting test exact rational 10^-10; NestWhile continues while relres > tol, tests initial state float 1e-10; break when relres <= tol, never tests initial state Logically identical for res_hist[0] = 1.0; same histories index-by-index.
7 Setup $M$-applies precondFunc[vecB] evaluated twice at init z = M(r) once, reused Cost-only; identical numbers.
8 Extras with no .wls counterpart variable_poisson_2d, ilu, flexible_pcg, spsolve verification, NystromPreconditioner (Python class), NPO Additive; the shared subset (CG/Jacobi on the canonical problem) is the correspondence surface.

Bottom line: within Python everything is bit-deterministic (fixed seeds; reruns reproduce results/baseline.json, results/nystrom.json, results/npo_eval.json exactly). Across languages items 2–4 and 6–7 are exact equivalences; item 1 (RNG) is the sole source of discrepancy and is irreducible without implementing one RNG inside the other.


4. mathematica/eigen_check.wls — brief

Rebuilds d1 and A exactly as in §1.1, then verifies against closed forms (derivations in 02-eigenvalues.md):

The Python counterpart’s verified numbers (results/spectra.json): 1-D max deviation 2.220e-15, 2-D max deviation 2.728e-11 (consistent with $\sim 10^4$-magnitude eigenvalues at float64), $\kappa$ computed 440.6885603835139 vs analytic 440.6885603836582, asymptote $4(n{+}1)^2/\pi^2 = 441.36$.

5. mathematica/nystrom_pcg.wls — brief

Same problem setup and the verbatim PCGSolve (lines 22–60), plus a Wolfram implementation of the randomized Nyström preconditioner of Frangella–Tropp–Udell (https://arxiv.org/abs/2110.02820); full math in 07-nystrom-preconditioning.md. NystromPreconditioner[matA, ell, seed] (lines 72–94):

Differences from the Python python/nystrom.py class worth flagging (details in 07-nystrom-preconditioning.md): the Python version follows Algorithm 2.1 more strictly — it orthonormalizes the sketch (np.linalg.qr(omega), python/nystrom.py line 105) where the .wls uses the raw Gaussian Omega; its shift is np.spacing(||Y||_F) (one ULP) vs the .wls’s $\sqrt{N}\,\varepsilon\,\Vert Y\Vert _F$; and it has the paper’s Cholesky-failure fallback loop (nu *= 100, retry ×4, lines 113–124) plus a mu > 0 generalization and a degenerate-rank guard. The .wls script runs only $\ell = 128$; the Python experiment sweeps ranks 16/64/128/256 and additionally computes the exact preconditioned condition numbers by dense eigensolves (439.62 / 434.52 / 426.58 / 407.46 vs unpreconditioned 440.69, results/nystrom.json) — the “Nyström is marginally worse than plain CG here” story (123/123/122/119 iterations vs 116) is told in 07-nystrom-preconditioning.md and 08-results.md.

6. The rest of python/ — one paragraph each, with pointers