The Trace-Maximal Orthogonal Completion

A closed form for maximizing \(\operatorname{Tr} W\) over orthogonal \(W\) with \(XW = Y\) — a follow-up to A. Kato's answer on Mathematica StackExchange.

The problem: given \(n \times d\) matrices \(X, Y\) with orthonormal rows (\(n \le d\)), find an orthogonal \(W\) with \(XW = Y\) and \(\operatorname{Tr} W\) as large as possible. Kato's answer reduced it to a small dense subproblem and solved that with an eigendecomposition. This note pushes the same idea to its end point: the whole problem has a one-SVD closed form, the geometry behind it fits in two pictures, and exploiting it brings the cost down to \(O(dn^2)\) for the solve.

1 · From trace to distance

For orthogonal \(W\),

$$\lVert W - I \rVert_F^2 \;=\; 2d - 2\operatorname{Tr} W,$$

so maximizing the trace is the same as finding the feasible orthogonal matrix closest to the identity: a Procrustes problem with a linear constraint. Everything that follows is a consequence of this one reformulation.

2 · The feasible set has exactly one knob

Let \(U\) and \(V\) be orthonormal bases of the row null spaces of \(X\) and \(Y\). If \(W\) is feasible, the rows of \(UW\) are orthonormal and orthogonal to the rows of \(Y\) — because \(UWY^\top = UWW^\top X^\top = UX^\top = 0\) — so \(UW = QV\) for some orthogonal \(Q\), and every feasible \(W\) is

W d × d = X⊤ d × n Y n × d + U⊤ d × (d−n) Q free O(d−n) V (d−n) × d
Blue blocks are pinned by the data; the orange block \(Q \in O(d-n)\) is the entire feasible set. The first term \(X^\top Y\) maps row-space to row-space; the second maps null space to null space.

The objective splits accordingly: \(\operatorname{Tr} W = \operatorname{Tr}(X^\top Y) + \operatorname{Tr}(Q\,VU^\top)\). By von Neumann's trace inequality, \(\operatorname{Tr}(QM) \le \sum_i \sigma_i(M)\) over orthogonal \(Q\), with equality at an orthogonal polar factor of \(M^\top\): if \(VU^\top = E\Sigma F^\top\), take \(Q = FE^\top\). (This replaces the Eigensystem + Inverse + matrix-square-root route — one SVD, no invertibility assumption, and orthogonality error around \(10^{-15}\) instead of \(10^{-13}\).)

orthogonalSolverSVD[X_?MatrixQ, Y_?MatrixQ] := Module[{U, V, e, s, f},
   U = NullSpace[X]; V = NullSpace[Y];
   If[U === {}, Return[Transpose[X] . Y]];   (* n == d: W is determined *)
   {e, s, f} = SingularValueDecomposition[V . Transpose[U]];
   Transpose[X] . Y + Transpose[U] . f . Transpose[e] . V];

3 · All the action lives in a ≤ 2n-dimensional subspace

Every optimal \(W\) fixes the orthogonal complement of \(S = \operatorname{rowspan}(X) + \operatorname{rowspan}(Y)\) pointwise: an optimal \(Q\) aligns the two null spaces along their principal vectors, and \(S^\perp\) lies in both. In a basis adapted to \(S\), the optimal \(W\) is block diagonal —

Wₛ I ≤ 2n d − dim S untouched W  =  I  +  B⊤ (Wₛ − I) B B: orthonormal basis of S (≤ 2n rows) Wₛ: the same problem, in dimension ≤ 2n
The \(d\)-dimensional problem collapses onto \(S\): solve a copy of itself in dimension \(\le 2n\), embed back, and the rest of space rides along as the identity.
orthogonalSolverFast[X_?MatrixQ, Y_?MatrixQ] := Module[{d, B, Ws},
   d = Last@Dimensions@X;
   B = DeleteCases[Orthogonalize@Join[X, Y], v_ /; Norm[v] < 10^-8];
   Ws = orthogonalSolverSVD[X . Transpose[B], Y . Transpose[B]];
   IdentityMatrix[d] + Transpose[B] . (Ws - IdentityMatrix@Length@B) . B];

The solve is \(O(dn^2)\); only assembling the dense \(d \times d\) answer costs \(O(d^2 n)\). If \(d\) is truly large, keep \(W\) in the factored form \(I + B^\top(W_s - I)B\) and never materialize it.

4 · The n = 1 case, and a small surprise

For a single pair \((x, y)\) with \(c = x \cdot y\), the optimal trace is \(d - 2 + 2\max(c, 0)\), and the optimum depends on the sign of \(c\):

x y φ c ≥ 0: rotate by φ Tr = d − 2(1 − c)  (Rodrigues) x y mirror c < 0: reflect instead Tr = d − 2,  det W = −1
For \(c \ge 0\) the optimum is the transpose of the Rodrigues rotation \(I + K + K^2/(1+c)\), \(K = yx^\top - xy^\top\), turning only in \(\operatorname{span}\{x,y\}\). For \(c < 0\) a reflection of that plane scores higher — the rotation is only optimal among proper rotations. (A detailed derivation of the rotation formula is in this companion note.)

The same applies in general: the maximizer over \(O(d)\) frequently has \(\det W = -1\) — the question's own SeedRandom[1] instance is an example. If your application needs a proper rotation, check the result: when \(\det W = -1\), replace \(FE^\top\) by \(F\,\mathrm{diag}(1,\ldots,1,-1)\,E^\top\) (a Kabsch-style flip of the smallest singular direction), at a cost of exactly \(2\sigma_{\min}(VU^\top)\) in trace.

5 · Benchmarks

All three closed-form solvers agree to \(10^{-14}\) in trace; the differences are speed and constraint accuracy. Warmed-up timings (RepeatedTiming, M-series laptop, \(n = 10\)):

Log-log timing comparison of the three solvers as d grows
Kato's solver and the one-SVD variant both do \(O(d^3)\) work in the null-space computation; the compressed solver's solve is \(O(dn^2)\) and its dense assembly \(O(d^2 n)\), giving the shallower slope.
solver, (n, d)Tr|XW − Y||W⊤W − I|time
Kato (10, 100)82.785174344178.5e-161.1e-131.1 ms
one SVD (10, 100)82.785174344179.3e-164.1e-150.6 ms
compressed (10, 100)82.785174344171.1e-152.8e-150.12 ms
compressed (10, 2000)1980.552.7e-154.8e-153.3 ms
A historical footnote: on the question's original SeedRandom[1] instance, Minimize reports 0.82896 — slightly above the true optimum 0.828952. Its answer violates orthogonality by \(1.7 \times 10^{-5}\), which buys the extra trace; the closed form is exactly feasible.

6 · Everything in one block

(* maximize Tr[W] over orthogonal W subject to X.W == Y;
   X, Y: n x d with orthonormal rows, machine precision *)

orthogonalSolverSVD[X_?MatrixQ, Y_?MatrixQ] := Module[{U, V, e, s, f},
   U = NullSpace[X]; V = NullSpace[Y];
   If[U === {}, Return[Transpose[X] . Y]];
   {e, s, f} = SingularValueDecomposition[V . Transpose[U]];
   Transpose[X] . Y + Transpose[U] . f . Transpose[e] . V];

orthogonalSolverFast[X_?MatrixQ, Y_?MatrixQ] := Module[{d, B, Ws},
   d = Last@Dimensions@X;
   B = DeleteCases[Orthogonalize@Join[X, Y], v_ /; Norm[v] < 10^-8];
   Ws = orthogonalSolverSVD[X . Transpose[B], Y . Transpose[B]];
   IdentityMatrix[d] + Transpose[B] . (Ws - IdentityMatrix@Length@B) . B];

(* test on the question's instance *)
SeedRandom[1];
{n, d} = {2, 4};
W0 = RandomVariate@CircularRealMatrixDistribution@d;
X = Orthogonalize@RandomVariate[NormalDistribution[], {n, d}];
Y = X . W0;
W = orthogonalSolverSVD[X, Y];
{Tr[W], Norm[X . W - Y], Norm[Transpose[W] . W - IdentityMatrix[d]]}
(* {0.828952, 3.5*10^-16, 8.0*10^-16} *)