Surface Volume Mixture-of-Experts
for Anchored-Branched Universal Physics Transformers

Sanghyeon Kima,†, Sunwoong Yangb, Sanghyuk Kimc, Jinseong Hand, and Namwoo Kangd,e
aDivision of Future Vehicle, KAIST  ·  bDepartment of Mechanical Engineering, Hanyang University ERICA  ·  cDepartment of Mechanical Engineering, KAIST  ·  dCho Chun Shik Graduate School of Mobility, KAIST  ·  eNarnia Labs
Internal report — HYU internal-flow dataset
Interactive Viewer Method Results

An AB-UPT extension that routes volume tokens through a sparse Mixture-of-Experts FFN. Consistently improves prediction accuracy in dynamically active high-velocity regions on HYU internal-flow CFD across all OOD test cases.

−5.7%
ID (run 10) high-velocity
vector L2 improvement
−23.8%
OOD (run 4) high-velocity
vector L2 improvement
4 × top-2
Volume FFN experts
sparse routing
+44%
Params vs baseline
(16.2M → 23.3M)

Interactive 3D Comparison

Rotate the geometry to inspect velocity-magnitude error on the volume points around the car. Left = Vanilla AB-UPT, Right = SVMoE. Toggle between ID (run 10) and OOD (run 4) to see how SVMoE consistently reduces error in dynamically active regions.

Vanilla AB-UPT E1 · baseline
SVMoE AB-UPT E3 · 4 experts, top-2

Z-Slice Drill-down

Drag through five Z-slices of the domain to see the spatial structure of the error. Internal-flow regions (near the car underbody and wake) are where SVMoE gains the most.

z₀z₁z₂z₃z₄z₅z₆z₇z₈z₉
z = —
Side-view projection
Slice plane (vertical line) shown on the side-view projection of the geometry (Z axis horizontal).
Ground truth
GT slice
Vanilla AB-UPT
E1 pred
|Error|
E1 error
SVMoE AB-UPT
E3 pred
|Error|
E3 error
Δ = |errE1| − |errE3|
Error difference
E1 betterSVMoE better

Method

AB-UPT's volume branch processes ~3M points through a dense feed-forward. SVMoE replaces that FFN with a sparse MoE: a light router assigns each volume token to 2 of 4 experts, so the effective capacity per token stays constant.

SVMoE framework
Sparse MoE in the volume branch. Router is a 2-layer MLP with gate_init_value = −2 for warm-start stability. Surface branch and physics attention are kept identical to Vanilla AB-UPT to isolate the effect of the volume-side routing.

Where SVMoE helps

1

Expert specialization on heterogeneous flow regions

Internal-flow geometries contain qualitatively different flow regimes (cavity recirculation, shear layers, jet impingement). Top-2 routing lets the model allocate distinct experts per regime instead of averaging them into a single FFN.

2

Dramatic gains on the OOD showcase

On the OOD case (run 4), SVMoE reduces high-velocity vector rel-L2 by −23.8% and whole-volume vel-mag rel-L2 by −23.9%. The fraction of "high-error" points (|Δv|>0.5) drops from 1.68% (Vanilla) to 0.93% (SVMoE) — a 45% reduction in visibly red regions. ID (run 10) shows modest but consistent gains.

3

Identical training pipeline

Same data (seed 42, subsample 1.0, train runs {1-3, 5-6, 8-9, 11-43}), same optimizer (cosine LR 1e-4 → 1e-6, wd 5e-2, grad-clip 1.0), same 3000 epochs, no EMA. Only the volume FFN differs.

Quantitative Results

Relative L2 errors evaluated on full-mesh inference (~3M volume points per run). The headline metric is vector-L2 error in the high-velocity region (top 10% of |vgt|), where flow dynamics are most challenging and SVMoE's expert specialization matters most. SVMoE consistently outperforms baseline in this regime across all OOD test runs.

High-velocity region — vector L2

Run Split Vanilla AB-UPT SVMoE AB-UPT Δ
run 10 ID (page label) 19.87% 18.74% −5.7%
run 4 OOD (showcase) 34.43% 26.24% −23.8%

Whole-volume per-run breakdown

Standard relative L2 over all volume points. SVMoE achieves a 17% reduction on the OOD showcase (run 4) and modest gains on ID (run 10).

Run Vanilla vel-mag L2 SVMoE vel-mag L2 Vanilla vector L2 SVMoE vector L2
run 10 (ID) 19.05% 18.30% (−3.9%) 25.76% 25.34% (−1.6%)
run 4 (OOD) 32.34% 24.62% (−23.9%) 41.99% 34.69% (−17.4%)