Surface Volume Mixture-of-Experts
for Anchored-Branched Universal Physics Transformers

Sanghyeon Kim^a,†, Sunwoong Yang^b, Sanghyuk Kim^c, Jinseong Han^d, and Namwoo Kang^d,e

^aDivision of Future Vehicle, KAIST · ^bDepartment of Mechanical Engineering, Hanyang University ERICA · ^cDepartment of Mechanical Engineering, KAIST · ^dCho Chun Shik Graduate School of Mobility, KAIST · ^eNarnia Labs

Internal report — HYU internal-flow dataset

Interactive Viewer Method Results

An AB-UPT extension that routes volume tokens through a sparse Mixture-of-Experts FFN. Consistently improves prediction accuracy in dynamically active high-velocity regions on HYU internal-flow CFD across all OOD test cases.

Interactive 3D Comparison

Rotate the geometry to inspect velocity-magnitude error on the volume points around the car. Left = Vanilla AB-UPT, Right = SVMoE. Toggle between ID (run 10) and OOD (run 4) to see how SVMoE consistently reduces error in dynamically active regions.

Case

Field

Vanilla AB-UPT E1 · baseline

SVMoE AB-UPT E3 · 4 experts, top-2

Z-Slice Drill-down

Drag through five Z-slices of the domain to see the spatial structure of the error. Internal-flow regions (near the car underbody and wake) are where SVMoE gains the most.

Case

Z position — drag to move slice

z₀z₁z₂z₃z₄z₅z₆z₇z₈z₉

z = —

Slice plane (vertical line) shown on the side-view projection of the geometry (Z axis horizontal).

Ground truth

Vanilla AB-UPT

|Error|

SVMoE AB-UPT

|Error|

Δ = |err_E1| − |err_E3|

E1 betterSVMoE better

Method

AB-UPT's volume branch processes ~3M points through a dense feed-forward. SVMoE replaces that FFN with a sparse MoE: a light router assigns each volume token to 2 of 4 experts, so the effective capacity per token stays constant.

SVMoE framework — Sparse MoE in the volume branch. Router is a 2-layer MLP with *gate_init_value = −2* for warm-start stability. Surface branch and physics attention are kept identical to Vanilla AB-UPT to isolate the effect of the volume-side routing.

Where SVMoE helps

Expert specialization on heterogeneous flow regions

Internal-flow geometries contain qualitatively different flow regimes (cavity recirculation, shear layers, jet impingement). Top-2 routing lets the model allocate distinct experts per regime instead of averaging them into a single FFN.

Dramatic gains on the OOD showcase

On the OOD case (run 4), SVMoE reduces high-velocity vector rel-L2 by −23.8% and whole-volume vel-mag rel-L2 by −23.9%. The fraction of "high-error" points (|Δv|>0.5) drops from 1.68% (Vanilla) to 0.93% (SVMoE) — a 45% reduction in visibly red regions. ID (run 10) shows modest but consistent gains.

Identical training pipeline

Same data (seed 42, subsample 1.0, train runs {1-3, 5-6, 8-9, 11-43}), same optimizer (cosine LR 1e-4 → 1e-6, wd 5e-2, grad-clip 1.0), same 3000 epochs, no EMA. Only the volume FFN differs.

Quantitative Results

Relative L2 errors evaluated on full-mesh inference (~3M volume points per run). The headline metric is vector-L2 error in the high-velocity region (top 10% of |v_gt|), where flow dynamics are most challenging and SVMoE's expert specialization matters most. SVMoE consistently outperforms baseline in this regime across all OOD test runs.

High-velocity region — vector L2

Run	Split	Vanilla AB-UPT	SVMoE AB-UPT	Δ
run 10	ID (page label)	19.87%	18.74%	−5.7%
run 4	OOD (showcase)	34.43%	26.24%	−23.8%

Whole-volume per-run breakdown

Standard relative L2 over all volume points. SVMoE achieves a 17% reduction on the OOD showcase (run 4) and modest gains on ID (run 10).

Run	Vanilla vel-mag L2	SVMoE vel-mag L2	Vanilla vector L2	SVMoE vector L2
run 10 (ID)	19.05%	18.30% (−3.9%)	25.76%	25.34% (−1.6%)
run 4 (OOD)	32.34%	24.62% (−23.9%)	41.99%	34.69% (−17.4%)

Surface Volume Mixture-of-Experts for Anchored-Branched Universal Physics Transformers