TideGS: Training Over 1 Billion 3D Gaussians on a Single 24GB GPU, ICML 2026 Spotlight

3D Gaussian Splatting (3DGS) has been one of the most prominent technical approaches in the 3D reconstruction field in recent years. However, it faces a critical bottleneck: VRAM.

Each Gaussian primitive carries a large attribute vector. When the number of primitives scales to tens of millions, the parameter table exceeds GPU capacity. Previously, on consumer-grade single-GPU hardware, systems could handle at most a few tens of millions of Gaussians.

The TideGS paper, accepted as a Spotlight at ICML 2026, shatters this ceiling by reaching 1 billion+ Gaussians—all using just a single 24GB GPU.

Core Insight: 3DGS Training is Inherently Sparse

The team's starting point is clever: 3DGS training is inherently sparse and trajectory-conditioned.

Each iteration only activates the Gaussians visible from the current camera batch. This means GPU VRAM doesn't need to serve as persistent parameter storage; instead, it can act as a working set cache—loading only the subset of Gaussians currently needed.

Three Synergistic Technologies

TideGS manages parameters through a three-tier SSD-CPU-GPU storage hierarchy:

1. Block-Virtualized Geometry

Aligns spatial locality with SSD architecture. It organizes 3D space into blocks, ensuring that physically adjacent Gaussians are also adjacent in storage, thereby reducing I/O fragmentation.

2. Hierarchical Asynchronous Pipeline

Overlaps I/O and computation. While training the current batch on the GPU, it prefetches the next batch of required Gaussian data from the SSD, ensuring neither process blocks the other.

3. Trajectory-Adaptive Differential Streaming

Transfers only the working set increments between iterations. Instead of fully reloading data every time, it calculates which Gaussians have changed state and streams only the deltas.

Performance & Scale

The comparison numbers speak for themselves:

Standard in-memory training: ~11 million Gaussians
Previous out-of-core baselines: ~100 million Gaussians
TideGS: Over 1 billion Gaussians

On large-scale scenes, TideGS also surpasses evaluated single-GPU baselines in reconstruction quality.

Why It Matters

3D reconstruction is transitioning from labs to real-world applications—autonomous driving, digital twins, and AR/VR all require processing city-scale, large-scale scenes. TideGS enables single-GPU training for billion-level Gaussians, significantly lowering the hardware barrier for large-scale 3D reconstruction.

Paper: arXiv:2605.20150

Core Insight: 3DGS Training is Inherently Sparse

Three Synergistic Technologies

1. Block-Virtualized Geometry

2. Hierarchical Asynchronous Pipeline

3. Trajectory-Adaptive Differential Streaming

Performance & Scale

Why It Matters

Related

APWA: A Distributed Architecture for True Parallelization in Multi-Agent Systems

Dual-Dimensional Consistency: A New Method to Save 10x Tokens During Inference-Time Scaling

MemEye: A Visual-Centric Evaluation Framework for Multimodal Agent Memory Capabilities