This post is a walkthrough of 3D Gaussian Splatting (3DGS) and my Swift/Metal implementation, MetalSprocketsGaussianSplats. In this post, I cover what splats are, how they’re generated, the various on-disk file formats, and the “standard” 3DGS rendering pipeline. I assume you’re somewhat comfortable with 3D graphics but haven’t worked with Gaussian splats before.
What Are 3D Gaussian Splats?
I like to oversimplistically describe a 3D Gaussian Splat (3DGS) as a “fancy point cloud.” They’re a computer graphics technique for representing and rendering 3D scenes, and unlike meshes or voxels, they can produce photorealistic results from real-world captures without ever defining a single triangle.
The basic idea: instead of describing a scene as a bunch of flat surfaces, you describe it as a cloud of tiny 3D blobs, each one a Gaussian (a soft, fuzzy ellipsoid). Every Gaussian carries a position, a shape (covariance), an opacity, and colour information encoded as spherical harmonics. When you render the scene, you “splat” each Gaussian onto the screen, project it into 2D, sort them, and alpha-blend them together. The result, when you have enough of these little blobs (we’re talking hundreds of thousands or more), is a surprisingly convincing image.
3DGS took off in 2023 after Kerbl et al. published 3D Gaussian Splatting for Real-Time Radiance Field Rendering. The big deal was rendering speed: you could get quality comparable to Neural Radiance Fields (NeRFs) while rendering at real-time framerates, with no neural network inference required. That paper started a land rush of follow-up research, new tooling, and competing file formats.
NeRFs
Before Gaussians, there were NeRFs, Neural Radiance Fields. You train a neural network to memorise a scene. You feed it a 3D position and a viewing direction, and it returns a colour and a density. To render a pixel, you shoot a ray into the scene and sample the network at a bunch of points along that ray, accumulating colour and opacity as you go (volume rendering).
NeRFs can produce gorgeous results. The problem is that rendering is slow: you’re evaluating hundreds of neural networks per pixel, per frame. Real-time NeRF rendering is possible but requires serious tricks (baked representations, hash grids, distillation), and even then, it’s a fight.
Radiance Fields
NeRFs and 3DGS are both “radiance field” methods. A radiance field is a function that maps any point in space (and optionally a viewing direction) to a colour and density. The difference is in how you represent that function:
- NeRFs represent it as a neural network; this is compact but expensive to evaluate.
- 3DGS represents it explicitly as a set of Gaussian primitives; this is more memory-intensive but trivially parallelisable and much faster to render on a GPU.
The explicit representation is what enables 3DGS to work in real time. No network inference in the render loop, just projection, sorting, and blending. It maps naturally onto GPU rasterisation pipelines.
Generating Splats
Getting from “photos of a thing” to “a Gaussian splat you can render” is essentially a photogrammetry pipeline: you capture a bunch of images of a scene, reconstruct the camera positions, and then use that data to build a 3D representation. Traditional photogrammetry produces meshes with baked textures, and it works well for rigid, well-textured objects. But it struggles with anything transparent, reflective, thin, or fuzzy: glass, hair, foliage, water. These are hard to reconstruct as surfaces because they don’t have simple surfaces.
3DGS avoids a lot of these problems. Because splats are semi-transparent blobs rather than opaque triangles, they can naturally represent soft, volumetric, or view-dependent stuff that meshes can’t. They’re not perfect, and you’ll still get artefacts on mirrors and large transparent surfaces, or if you get two close together, but the failure modes are different and generally more forgiving.
The pipeline has two stages: first, you figure out where the cameras were, then you train the Gaussians to match what the cameras saw.
For this post, I’m using The Vendel I Helmet by The Swedish History Museum (CC-BY-4.0) as the example scene. Here it is in Blender:

To generate training data, I rendered the helmet from many different viewpoints, producing a set of overlapping images that cover the object from all angles. Blender is scriptable via Python and can run headless, so generating hundreds of renders from different camera positions is straightforward to automate.

Structure from Motion (SfM)
Before you can train any radiance field, you need to know the camera poses, where each photo was taken and which direction it was facing. That’s what Structure from Motion (SfM) gives you. You feed in a set of overlapping photos of a scene, and SfM estimates the camera intrinsics (focal length, distortion) and extrinsics (position and orientation) for each photo. It also produces a sparse 3D point cloud of the scene and the feature points it matched across images.
The most common tool here is COLMAP. Since I rendered my training images from Blender, I already knew the camera poses and could have exported them directly from the script. But COLMAP also produces the sparse point cloud that the training process uses as its initial set of Gaussian positions, and I wanted to show a more typical real-world workflow.
COLMAP is open source and available via Homebrew (brew install colmap). You can run it from the command line or use colmap gui for an interactive interface.

Training
Once you have camera poses and a sparse point cloud from SfM, you can train your Gaussians. The process starts by placing an initial set of Gaussians at the sparse point cloud positions, one per point. Then, through differentiable rendering, the system iteratively adjusts each Gaussian’s parameters (position, covariance, colour, and opacity) to minimise the difference between the Gaussians’ renderings and the actual photos.
During training, the system also splits and adds Gaussians in areas that need more detail and prunes Gaussians that are nearly transparent or redundant. Over thousands of iterations, the cloud converges from a sparse scattering of blobs into a dense, detailed representation of the scene.
The training loop is essentially: render the current gaussians from one of the known camera positions, compare the rendered image to the actual photo, compute the loss, and backpropagate gradients to adjust every splat’s parameters.

For tooling, the original 3DGS implementation still works, but most people these days use gsplat or NerfStudio, which are faster and have more export options.
Training doesn’t always converge cleanly. Here’s what it looks like when things go wrong:

Training is almost exclusively done on NVIDIA GPUs using CUDA; the original implementation and most forks require it. There are efforts to support other backends (Metal, OpenCL, CPU), but for now, if you want to train your own splats, you’ll want access to an NVIDIA card or a cloud GPU instance (I’ve had no luck training on my Apple Silicon device).
That said, at WWDC 2025 Apple introduced spatial scenes for visionOS 26, which uses Gaussian splats trained from a single image on device. Apple also open-sourced SHARP, the model behind spatial scenes. SHARP takes a single photo and produces a full Gaussian splat scene in a second or two. You can use SHARP from Swift with my project: here. After seeing SHARP I expect Apple to do more work with Gaussian Splats in the future.
Coordinate Spaces
Trained splats often come out in an awkward coordinate space. The scene may be normalised to a -1 to 1 bounding box, and depending on the training tool and input data, Z may be up rather than Y. If your renderer assumes Y-up, everything will appear rotated by 90 degrees.
This means it’s not uncommon to have to transform splats after training: rotate the whole cloud to match your engine’s coordinate system, scale it to the correct world-space size, or both. You can bake the transform into the file with a tool like splat-transform, or apply it at render time via a model matrix.
What is a splat?
A Gaussian splat is a 3D ellipsoid, a sphere that’s been stretched and rotated into an M&M candy shape.

Unlike a hard-edged mesh primitive (or a candy), a Gaussian has no surface. Its density follows a Gaussian (bell curve) distribution: strongest at the centre, falling off smoothly toward the edges. Think of it like a soft, fuzzy blob of colour floating in space.
The shape of the ellipsoid is defined by a 3D covariance matrix, but in practice that matrix is stored as a scale (how far the ellipsoid extends along each of its three local axes) and a rotation (how those axes are oriented in world space). A splat with equal scale on all axes is a sphere. Flatten one, and you get that M&M shape. Real scenes end up with a mix of shapes: thin splats for edges, flat splats for surfaces, etc.
Here’s what a single splat might look like as a theoretical Swift struct:
struct Splat {
var position: SIMD3<Float>
var scale: SIMD3<Float>
var rotation: simd_quatf
var color: SIMD3<Float> // See note.
var opacity: Float
}
Each splat has a position in 3D space, a scale that controls how big the ellipsoid is along each local axis, a rotation quaternion that orients it, a colour, and an opacity.
Note that colour here is just a single RGB value, the “SH0” constant term. But for view-dependent colour via higher-order spherical harmonics (SH1–SH3), you’d need additional coefficients per splat, which we’ll cover in the next section. But this is enough to place, shape, and shade one fuzzy blob. Add several hundred thousand or more, and you’ve got a scene.
Colour
Every splat carries two appearance attributes: colour and opacity.
Opacity is straightforward but critical to the algorithm. Each splat has an alpha value that, when combined with the Gaussian falloff, controls how much it contributes to the final pixel. Splats are semi-transparent: you’re looking through a cloud of fuzzy blobs, and the final colour at any pixel is the result of blending potentially hundreds, or more, overlapping splats together. Alpha blending is order-dependent, and getting it wrong results in visible artefacts.
The simplest approach to colour is a single RGB value per splat, which corresponds to the DC (degree 0/SH0) spherical harmonic coefficient. This gives each splat a fixed colour regardless of viewing angle, and in many use cases, it looks perfectly fine. But real-world objects aren’t like that. A shiny car looks different from the front than from the side. A leaf catches light differently depending on where you’re standing. To capture that, 3DGS uses spherical harmonics.
Spherical Harmonics
Spherical harmonics (SH) are a set of basis functions defined on the surface of a sphere; a way of breaking a complex signal into a sum of simple component functions. Low-degree terms capture broad, smooth variation; higher degrees add finer detail. By storing a set of SH coefficients per splat, you can encode a colour that varies with viewing direction.
Spherical harmonics come in degrees (sometimes called orders), and each degree adds more directional detail:
- Degree 0 (SH0): 1 coefficient per colour channel (3 total).
- Degree 1 (SH1): 3 coefficients per channel (9 total).
- Degree 2 (SH2): 5 coefficients per channel (15 total).
- Degree 3 (SH3): 7 coefficients per channel (21 total). The highest degree used in the original paper.
Degree 0 gives you a single constant colour, no view dependence at all. Degree 1 adds a basic directional gradient, so a splat can be brighter on one side than the other. Beyond that, each degree adds more angular detail: enough to approximate soft highlights and subtle view-dependent colour shifts, though even at degree 3, spherical harmonics are still quite low-frequency.
That’s 16 (1 + 3 + 5 + 7) coefficients per colour channel, or 48 floats per splat at full degree 3. This is a big chunk of the per-splat memory budget and a major reason compressed formats like .splat drop everything above degree 0; you trade view-dependent effects for a much smaller file.
From Splats to Scenes
So far, we’ve talked about individual splats. But a single splat doesn’t look like much. A scene is what happens when you combine hundreds of thousands (or millions) of them.
As discussed, each splat is semi-transparent, and when you look at a scene from any viewpoint, you’re looking through a dense cloud of overlapping Gaussians. The colour at every pixel is the result of alpha-blending all the splats that overlap that pixel, back-to-front.
The key property this gives you is view independence. There’s not one “correct” viewpoint; you can render the scene from any angle, and the blending works. Even a scene using only SH0 (flat colour per splat) looks convincing from arbitrary viewpoints, because the 3D structure is encoded in the positions, shapes, and opacities of the splats themselves. Higher-order spherical harmonics add to this by allowing specular highlights to shift and surfaces to catch light differently as you move. Still, the fundamental ability to orbit freely comes from the representation, not the colour model.
On-Disk Representations
There’s no single standard file format for Gaussian splats. The ecosystem has settled on a handful of competing formats, each with different trade-offs in terms of size, fidelity, and feature support. Every format except PLY is lossy; they reduce file size by quantising, compressing, or outright dropping data.
PLY
PLY (Polygon File Format) is what you’ll get out of the training tools: the original 3DGS implementation, gsplat, and nerfstudio all output .ply by default. It’s an uncompressed format that stores every property at full precision: position, scale, rotation, opacity, and all the spherical harmonic coefficients up to degree 3. There are binary and ASCII variants of PLY.
The upside is that you get everything the training process produced, exactly as-is. The downside is size. A single splat with degree 3 SH carries 62 float properties (3 position + 3 scale + 4 rotation + 1 opacity + 3 DC colour + 48 SH coefficients), and at 4 bytes per float, that’s 248 bytes per splat. A large scene might have a million or more splats, totalling hundreds of megabytes for a single file.
.splat
The .splat format was created by antimatter15 as a lightweight format for web viewers. It strips the splat down to the bare essentials: position, scale, rotation, and a single RGBA colour, no spherical harmonics at all. Everything gets packed into a compact binary layout using half-precision floats and quantised colour.
The result is dramatically smaller files and fast loading, at the cost of losing all view-dependent colour. For many casual viewing scenarios, that’s a perfectly fine trade-off. .splat also became popular partly because it was early. It was simple, and when people were scrambling to build implementations in late 2023, it was one of the first compact formats to be defined.
.spz
Niantic’s .spz format takes a different approach: it actually compresses the data rather than just truncating it. It supports spherical harmonics, so you keep view-dependent colour, but uses quantisation and compression to bring file sizes way down compared to .ply.
.sog
SOG (Spatially Ordered Gaussians) is the current “best” format (in terms of size). It takes a fundamentally different approach to storing splats. Instead of a flat list of per-splat properties, it uses k-means clustering to group similar splats together. Then it represents the data as a zip file full of images, essentially encoding different splat properties into texture-like grids. This lets the data exploit spatial coherence within clusters for better compression and maps well to GPU texture sampling. The result is by far the best compression of any format, dramatically smaller than .spz while still supporting spherical harmonics and maintaining quality.
To give a concrete sense of the size differences, here’s the Vendel I Helmet scene (34,734 Gaussians, which is a pretty small scene) in two formats:
| Format | File | Size |
|---|---|---|
| .ply (original) | Helmet.ply | 8.2 MB |
| .sog | Helmet.sog | 1.5 MB |
Here’s the helmet as an interactive 3DGS viewer (exported as a self-contained HTML page by splat-transform). Drag to orbit.
For converting between formats and transforming splat clouds in programmable ways, splat-transform is a handy command-line tool.
How 3D Graphics Rendering Usually Works
Conventional 3D rendering, whether rasterisation (turning triangle meshes into pixels) or ray tracing (shooting rays and simulating light bounces), usually assumes your scene is composed of surfaces. Gaussian splatting throws that assumption out. There are no surfaces, no triangles, no rays. Just a cloud of semi-transparent blobs that you project and blend.
“Standard” 3DGS Rendering
The standard 3DGS rendering pipeline is conceptually simple. At a high level, for each frame you:
- Sort the Gaussians by depth.
- Render each Gaussian as a screen-aligned quad (a billboard) with a Gaussian falloff, blending them together back-to-front. The projection from a 3D ellipsoid to 2D screen-space ellipse happens in the vertex shader.
Sorting
Because splats are alpha-blended, they have to be rendered in depth order, back-to-front. That means every frame, you need to sort hundreds of thousands of splats by their distance from the camera. This is one of the biggest bottlenecks for real-time performance. The sort has to be fast and run every time the camera moves.
Rendering
Each splat gets rendered as a screen-aligned quad, a billboard. The GPU doesn’t draw ellipsoids directly; instead, for each splat, you emit a quad that’s big enough to contain the projected Gaussian, and a shader evaluates the Gaussian falloff per-pixel to produce the soft blob.
In practice, you don’t issue a draw call per splat. You use instanced rendering. A single draw call renders all splats: each instance is one splat, and the vertex shader indexes into the sorted splat buffer by instance ID to fetch that splat’s data. This keeps draw call overhead minimal even with millions of splats.
Vertex Shader
The vertex shader determines the position, size, and orientation of each billboard quad. The splat’s 3D covariance matrix describes an ellipsoid in world space, but you need a 2D ellipse on screen. You project the splat’s 3D covariance through the view transform and the perspective projection into screen space. The perspective projection is nonlinear, so you can’t just multiply the covariance by the projection matrix. Instead, you linearise the projection at the splat’s position using its Jacobian (the matrix of partial derivatives), then use that to transform the covariance: Σ’ = J · V · Σ · Vᵀ · Jᵀ, where V is the view transform. J is the Jacobian of the perspective projection. The resulting 2D covariance Σ’ gives you the axes of the screen-space ellipse, which tells you how to size and orient your billboard quad.
This is also where spherical harmonic evaluation happens. The vertex shader computes the view direction from the camera to the splat and evaluates the SH coefficients to produce a final colour for that viewing angle. The result gets passed to the fragment shader as a per-vertex attribute.
The vertex shader is most of the GPU code.
Fragment Shader
The fragment shader is the simple part. For each pixel covered by the billboard quad, it evaluates the 2D Gaussian function using the projected covariance to determine how much this splat contributes, with brightness and opacity varying from bright and opaque near the centre to transparent at the edges. That value gets multiplied by the splat’s colour and opacity, and alpha-blended into the framebuffer. In practice, the fragment shader is just a few lines of code.
Performance
The fragment shader is tiny (a few lines of math) but it runs a lot. Every splat’s billboard quad covers some number of pixels, and in dense regions of the scene, dozens or hundreds of splats overlap the same pixel. That’s overdraw, and it’s where most of the GPU time actually goes. A heatmap of per-pixel fragment invocations in a typical scene lights up hot wherever splats pile up. The shader is cheap per invocation, but sheer volume makes it the dominant cost on the rendering side.
MetalSprocketsGaussianSplats
MetalSprocketsGaussianSplats is my Gaussian splat library, built on top of MetalSprockets. It handles loading splat files, sorting, and rendering. It supports all the on-disk formats discussed above.
The library ships with three different 3DGS renderers: traditional (as described above, based on Spark.js, and two experimental alternatives I describe later in this post.
MetalSprockets
MetalSprocketsGaussianSplats is based on MetalSprockets, a declarative, composable layer for Metal in Swift. “Like SwiftUI, but for Metal.” Metal is powerful, but setting up even a simple render pass means a lot of descriptor/pipeline/encoder boilerplate. MetalSprockets borrows ideas from SwiftUI (result builders, composable trees, property wrappers) and applies them to Metal. You build GPU workloads as a tree of Elements, bind shader parameters by name, and mix render, compute, mesh, and object shaders in the same graph. It works in SwiftUI, ARKit, and visionOS immersive spaces.
I won’t go into further detail about MetalSprockets in this post, but I plan to write more about it in the future.
MetalSprocketsGaussianSplats Example
MetalSprocketsGaussianSplats includes SplatView, which lets you render a 3DGS splat cloud as simply as possible. In this minimal example, I show loading a .spz file and rendering it in a SwiftUI view:
struct SplatContentView: View {
let url: URL
@State private var cameraMatrix = simd_float4x4(translation: [0, 0, 3])
@State private var splatCloud: GPUSplatCloud<SparkSplat>?
var body: some View {
SplatView(splatCloud: splatCloud, cameraMatrix: cameraMatrix)
.task(id: url) {
let device = MTLCreateSystemDefaultDevice()!
let reader = try! SplatReader(url: url)
var splats: [SparkSplat] = []
splats.reserveCapacity(reader.splatCount)
try! reader.read { _, extendedSplat in
splats.append(SparkSplat(extendedSplat.genericSplat))
}
splatCloud = try! GPUSplatCloud(device: device, splats: splats)
}
}
}
I’m hoping to streamling basic usage more in the future
MetalSprocketsGaussianSplats provides a demo application that shows a simple Gaussian Splat scene and runs on macOS, iOS and visionOS (in immersive mode).
And here’s a more involved version showing the MetalSprockets render pipeline directly, with explicit sort management and a custom render pass. This is where you might start if you need to mix in other rendering techniques with 3DGS. Note the splats only sorted if the camera transform changes:
@State private var sortedIndices: SplatIndices?
@State private var sortManager: AsyncSortManager<SparkSplat>
var body: some View {
RenderView { _, drawableSize in
let projectionMatrix = projection.projectionMatrix(for: drawableSize)
let size = SIMD2<Float>(Float(drawableSize.width), Float(drawableSize.height))
if let sortedIndices {
try RenderPass {
try SparkSplatRenderPipeline(
splatCloud: splatCloud,
projectionMatrix: projectionMatrix,
modelMatrix: .identity,
cameraMatrix: cameraMatrix,
drawableSize: size,
sortedIndices: sortedIndices
)
}
}
}
.task {
for await indices in sortManager.sortedIndicesStream {
if let old = sortedIndices { sortManager.release(old) }
sortedIndices = indices
}
}
.onChange(of: cameraMatrix, initial: true) {
sortManager.requestSort(SortParameters(camera: cameraMatrix, model: .identity))
}
}
MetalSprocketsGaussianSplats Traditional Renderer in detail
Sorting Splats
The first step in each frame is to measure the distance from each splat to the camera. You project each splat’s position and get a depth value. Then you sort. You’re not actually reordering the splat data itself, for peroformance you’re sorting an array of indices & distances that refer back into the splat buffer.
The obvious approach is Swift.Array.sort(), which works, but is painfully slow (especially on non-optimised builds). Swift’s standard sort is a comparison-based and has all the overhead of per-element comparisons and branching. When you’re sorting hundreds of thousands or millions of indices every frame, that overhead kills your framerate.
The fix is to not use comparison sorts: convert the 32-bit floating-point distance to a 16-bit integer, and then run a radix sort. A Radix sort sorts by examining the key’s digits directly, so it’s O(n·k), where k is the number of key digits. With a 16-bit key, the whole sort is just two passes of a counting sort (one for each byte).
The 16-bit quantisation is lossy: you lose some depth precision. But in practice, 2^16 distinct depth buckets are more than enough. Splats that end up in the same bucket are close enough in depth that their rendering order doesn’t produce visible artefacts.
Sorting on GPU (Failed Idea #1)
My initial assumption was that sorting on the GPU would be much faster…
I implemented a bitonic sort as a Metal compute shader, based on a sorting network. It worked, but my implementation was actually considerably slower than the CPU radix sort. On top of that, the CPU is essentially free while the GPU is busy rendering. The sort and the render don’t compete for the same resource. If your CPU sort finishes before the GPU needs the results for the next frame, you’re getting the sort for free in terms of frame time. Moving it to the GPU adds contention with the render workload for no real benefit.
I ended up going back to the CPU radix sort. That said, there’s a lot of promising work on GPU sorting that I plan on revisiting.
Incremental Sorting (Failed Idea #2)
This idea was appealing: the first sort of a scene does a full radix sort, but subsequent sorts, where the data is mostly sorted because the camera only moved a little, switch to an insertion sort instead. Insertion sort is O(n) on nearly-sorted data, so in theory, this should be a big win. And if insertion sort detected too many swaps (meaning the data wasn’t nearly sorted after all), it would bail out and fall back to the radix sort.
In practice, it didn’t work out. Insertion sort’s cache access pattern on millions of elements was worse than the radix sort’s predictable streaming passes. The radix sort is just so fast on this workload that “mostly sorted” wasn’t enough of an advantage to justify the complexity. Ended up ripping it out.
Don’t Sort Until You Need To
The other big sorting optimisation is just… not sorting. If the camera hasn’t moved, the sort order from the last frame remains valid; there’s no need to sort it. We skip the sort entirely when the camera is stationary, which means static views are essentially free on the CPU side. Even when the camera is moving slowly, you can get away with sorting less frequently than every frame without noticeable artefacts.
Alternative Renderers
MetalSprocketsGaussianSplats ships with two alternative renderers.
Stochastic Rendering
The stochastic renderer skips the sort stage entirely by replacing alpha blending with a stochastic random test. Each fragment computes its opacity from the Gaussian falloff, then is either accepted (as fully opaque) or discarded. Over multiple frames, a temporal accumulation pass (exponential moving average) averages out the noise. Blue noise sampling with per-frame/per-splat offsets gives a clean looking dither pattern.
The main downside is temporal noise; stochastic rendering causes the image to shimmer frame by frame. While the temporal accumulation helps, it never fully settles the way a properly sorted render does. That said, skipping the sort is a huge simplification, and when the camera is moving quickly, the noise is much less noticeable.
Tile-Based Rendering
MetalSprocketsGaussianSplats also includes a third, experimental tile-based renderer that leverages Apple Silicon’s tile-based GPU architecture. It’s still a work in progress; I’ll cover it in a future post.
visionOS
MetalSprocketsGaussianSplats runs on visionOS, in both a 2D Metal View and an immersive Metal space.
Radiance
Radiance is my macOS Gaussian splat viewer app, built on MetalSprocketsGaussianSplats. It can load multiple splat files and render them all in the same scene, shows per-file and aggregate statistics (splat count, file size, bytes per splat, spherical harmonic degree, bounding box), and includes a macOS QuickLook plugin so you can preview .ply, .splat, .spz, and .sog files directly in Finder. Radiance will be available via TestFlight for macOS soon, with iOS and visionOS coming later.
Radiance can also convert Apple spatial scenes to Gaussian splat files:
Conclusion
Gaussian splats are a bit weird. They’re not meshes, they’re not volumes, they’re not really point clouds. They’re a pile of fuzzy blobs that, against all intuition, produce photorealistic images when you throw enough of them at a scene and sort them properly.
If you want to play with this stuff, grab Radiance and drag a .ply or .sog file in. Or download the helmet files from this post and poke around. The code is all open source.
Gaussian splats are still young. The formats are still settling, the tooling is still rough in places, and there’s a lot of low-hanging fruit in rendering performance. I expect Apple and other companies to be adding more Gaussian Splat functionality to their products over the next few years.
Future Posts
There’s more I want to cover than fits in one post. Here’s what I’m hoping to get around to next:
- End-to-end splat training: walking through the full pipeline from photos to rendered splats, including setting up a cloud VM with an NVIDIA GPU.
- Tile-based rendering: a deep dive into MetalSprocketsGaussianSplats’ tile-based renderer and how it takes advantage of Apple Silicon’s GPU architecture.
- Real-world performance: benchmarks across devices (M1 Ultra, A17 Pro, etc.), profiling the sort/vertex/fragment breakdown, and using MetalFX upscaling to hit higher framerates.
Links
Other Apple/iOS Gaussian Splat Projects
- MetalSplatter by Sean Cier
- metal-splats by laanlabs
Splat Viewers and Editors
- SuperSplat by PlayCanvas
- WebGL Gaussian Splat Viewer by antimatter15
- Spark.js
Introductions and Overviews
- 3D Gaussian Splatting for Real-Time Radiance Field Rendering (original paper)
- Introduction to 3D Gaussian Splatting (Hugging Face)
- A Comprehensive Overview of Gaussian Splatting by Kate Yurkova
Compression
- Making Gaussian Splats smaller by Aras Pranckevičius
- Making Gaussian Splats more smaller by Aras Pranckevičius
- Open-sourcing .SPZ by Scaniverse/Niantic
GPU Sorting
Matthew Kieber-Emmons’ series on Metal Compute:
- Optimizing Parallel Reduction in Metal for Apple M1
- Efficient Parallel Prefix Sum in Metal for Apple M1
- Memory Bandwidth Optimised Parallel Radix Sort in Metal for Apple M1 and Beyond
Other GPU sorting resources:
- GPUSorting by b0nes164 (OneSweep, CUDA, D3D12, Unity)
- GPUPrefixSums by b0nes164
- WebGPU-Radix-Sort by kishimisu
- AppleNumericalComputing radix sort by ShoYamanishi
Tools
- splat-transform by PlayCanvas
- COLMAP
- gsplat
- NerfStudio
- Polycam