Creating Pixelated 3D Effect with Metal Shaders

50 min read

Lately, I’ve been dedicating a lot of time to studying computer graphics and working with 3D basics. This article serves as a way to consolidate and apply what I’ve learned. Low chances that someone would need to implement something like this in a real iOS app. There are many libraries that allow this and more (Spline, Rive, etc.), but still I think it’s a great exercise to understand the fundamentals.

This article is filled with interactive visualizations to make it easier to understand and see what’s happening. We will be focusing more on the 3D rendering and less on the MetalKit specifics (like command queues, buffers, etc.).

Project Structure

We are going to divide our journey into several steps.

Step 1: Basic 3D Rendering

Here we are going to do a basic 3D rendering of a flower model (or basically any 3D model).

Step 2: Offscreen Rendering

This step will introduce the concept of offscreen rendering so the architecture of our project will be more clear and scalable.

Step 3: Post-Processing Pipeline

This is where we will apply the pixelation effect to the offscreen texture. Each step produces a working solution that can be run and tested independently.

At the end we will have this working animation.

Step 1: Basic 3D Rendering

UI Integration

As usual we are going to use UIViewRepresentable to bridge between SwiftUI and Metal, providing a clean separation between the declarative UI layer and the imperative Metal rendering code.

Code
import MetalKit
import SwiftUI

struct MetalViewRepresentable: UIViewRepresentable {

  final class Coordinator: NSObject {
    let renderer = Renderer()
    weak var mtkView: MTKView?
  }

  func makeCoordinator() -> Coordinator {
    Coordinator()
  }

  func makeUIView(context: Context) -> MTKView {
    let device = context.coordinator.renderer.device
    let view = MTKView(frame: .zero, device: device)

    view.clearColor = MTLClearColorMake(0, 0, 0, 1)
    view.colorPixelFormat = .bgra8Unorm
    view.depthStencilPixelFormat = .depth32Float

    view.preferredFramesPerSecond = 60
    view.isPaused = false
    view.enableSetNeedsDisplay = false
    view.framebufferOnly = true

    view.delegate = context.coordinator.renderer
    context.coordinator.mtkView = view

    return view
  }

  func updateUIView(
    _ uiView: MTKView,
    context: Context
  ) {}
}

Building Convenient Toolbelt

Before continuing with the Renderer implementation let’s introduce a couple of conveniences that will help us with shader and asset management.

Shader Management with ShaderLibrary

ShaderLibrary is a wrapper around MTLLibrary that provides convenient shader function access using Swift’s @dynamicMemberLookup feature. This eliminates the need for verbose function boilerplate at a calling site.

Code
import MetalKit

/// A wrapper around MTLLibrary that provides convenient shader function access
/// using Swift's @dynamicMemberLookup feature.
@dynamicMemberLookup
public struct ShaderLibrary {
  /// The underlying Metal library
  let library: MTLLibrary

  public init(
    library: MTLLibrary
  ) {
    self.library = library
  }

  /// Retrieves a shader function by name
  /// - Parameter name: The name of the shader function
  /// - Returns: The Metal function
  /// - Throws: An error if the function cannot be found
  public func function(named name: String) throws -> MTLFunction {
    let function = try library.makeFunction(
      name: name,
      constantValues: .init()
    )

    return function
  }

  public subscript(
    dynamicMember member: String
  ) -> MTLFunction {
    get throws {
      try function(named: member)
    }
  }
}

Asset Management with ObjectParser

ObjectParser is a helper type that loads an .obj file and its internal resources. It provides the mesh and submesh buffers (positions, normals, UVs), along with the vertex descriptor that matches the shaders. For the sake of simplicity we do some assumptions about the .obj file. Including the fact that it has a single mesh and a single texture.

.obj file contains the data about vertices, normals, UVs and builds faces (triangles) from them. If you ever tried to do “Hello World” in Metal and display a triangle then you are familiar with defining vertices and triangles.

v 0.0 1.0 0.0 v 1.0 0.0 0.0 v -1.0 0.0 0.0 vn 0.0 0.0 1.0 vn 0.0 0.0 -1.0 vt 0.0 0.0 vt 1.0 1.0 f 1/1/1 2/2/1 3/1/1 f 1/1/2 3/1/2 4/2/2 vertices (v) normals (vn) UV coords (vt) faces (f)

MDLVertexDescriptor reads the vertex data from the .obj file and stores it in a format that can be used by Metal. In our case we expect this data to have 3 attributes: position, normal and UV.

position normal UV coord float3 offset: 0 bytes: 12 float3 offset: 12 bytes: 12 float2 offset: 24 bytes: 8

MDLAsset is what we use to actually work with the .obj file. It provides a way to load the file and get the mesh and submesh buffers. Mesh buffer contains all the vertices defined in the file without any ordering. Submesh buffer takes the vertices and with indicies places them in order. This means that there are might be multiple submeshes defining different parts of a model. To get more familiar with this concept I’d recommend to watch this tutorial.

v1 (0, 1, 0) v4 (0, -1, 0) v2 (1, 0, 0) v5 (0, 0, 1) v3 (-1, 0, 0) v6 (0, 0, -1) raw mesh (unordered) submesh (organized) v1, v2, v3 v2, v4, v3 v4, v5, v6

Here is the complete implementation of ObjectParser:

Code
import ModelIO
import MetalKit

public struct ObjectParser {
  // mesh contains all the vertices, unordered
  public let mesh: MTKMesh
  // submesh takes the vertices and with indicies places them in order
  public let submeshes: [MTKSubmesh]
  public var textures: [MTLTexture]

  public let mdlVertexDescriptor: MDLVertexDescriptor = {
    // Position (0), Normal (1), Texcoord (2)
    let mdlVertexDescriptor = MDLVertexDescriptor()
    mdlVertexDescriptor.attributes[0] = MDLVertexAttribute(
      name: MDLVertexAttributePosition,
      format: .float3,
      offset: 0,
      bufferIndex: 0
    )
    mdlVertexDescriptor.attributes[1] = MDLVertexAttribute(
      name: MDLVertexAttributeNormal,
      format: .float3,
      offset: 12,
      bufferIndex: 0
    )
    mdlVertexDescriptor.attributes[2] = MDLVertexAttribute(
      name: MDLVertexAttributeTextureCoordinate,
      format: .float2,
      offset: 24,
      bufferIndex: 0
    )
    mdlVertexDescriptor.layouts[0] = MDLVertexBufferLayout(stride: 32)

    return mdlVertexDescriptor
  }()

  // Replace/extend your init with texcoord attribute and material texture loading
  public init(
    modelURL: URL,
    device: MTLDevice,
  ) {
    let allocator = MTKMeshBufferAllocator(device: device)

    let asset = MDLAsset(
      url: modelURL,
      vertexDescriptor: mdlVertexDescriptor,
      bufferAllocator: allocator
    )

    // Grab the first mesh
    let mdlMesh = asset.childObjects(of: MDLMesh.self).first as! MDLMesh

    // Ensure normals exist if missing
    if mdlMesh.vertexAttributeData(forAttributeNamed: MDLVertexAttributeNormal, as: .float3) == nil {
      mdlMesh.addNormals(withAttributeNamed: MDLVertexAttributeNormal, creaseThreshold: 0.0)
    }

    // Build MTKMesh
    let mesh = try! MTKMesh(mesh: mdlMesh, device: device)
    self.mesh = mesh
    self.submeshes = mesh.submeshes

    let keys: [MDLMaterialSemantic] = [.baseColor]
    let textureLoader = MTKTextureLoader(device: device)
    var _textures: [any MTLTexture] = []

    mdlMesh.submeshes?.forEach { submesh in
      if
        let mdlSubmesh = submesh as? MDLSubmesh,
        let material = mdlSubmesh.material
      {

        for key in keys {
          if let prop = material.property(with: key) {
            // If it’s a texture sampler, use its URL
            if
              prop.type == .string,
              let name = prop.stringValue
            {
              // Resolve relative to the OBJ’s folder
              let texURL = modelURL.deletingLastPathComponent().appendingPathComponent(name)
              if let tex = try? textureLoader.newTexture(URL: texURL, options: [
                .SRGB: false as NSNumber,
                .origin: MTKTextureLoader.Origin.bottomLeft
              ]) {
                _textures.append(tex)
                break
              }
            } else if
              prop.type == .URL,
              let url = prop.urlValue
            {
              // If MTL references a full URL
              if let tex = try? textureLoader.newTexture(URL: url, options: [
                .SRGB: false as NSNumber,
                .origin: MTKTextureLoader.Origin.bottomLeft
              ]) {
                _textures.append(tex)
                break
              }
            } else if
              prop.type == .texture,
              let mdlTex = prop.textureSamplerValue?.texture
            {
              // Embedded MDLTexture
              if let tex = try? textureLoader.newTexture(texture: mdlTex, options: [
                .SRGB: false as NSNumber,
                .origin: MTKTextureLoader.Origin.bottomLeft
              ]) {
                _textures.append(tex)
                break
              }
            }
          }
        }
      }
    }

    self.textures = _textures
  }
}

Affine Transformations with AffineTransform

This type encapsulates the essential 3D transformations: translation, rotation, and scaling. Here I’ve prepared a little visualization to play with the transformations interactively.

0.000
0.000
0.000
0.000
1.000
0.000
1.000
1.000
1.000

We handle rotation using quaternions. The API lets us set the axis and angle, and it handles the rest. Quaternions are a powerful tool widely used in 3D graphics. For more information I suggest to visit this tutorial explaining the concept and comparing it with another one - Euler Angles. Also this video from 3Blue1Brown explaining the concept of quaternions is a great resource.

The final matrix is computed as Translation * Rotation * Scale. This order ensures that scaling happens first, then rotation, then translation. Experiment with other orders to see how it affects the result.

Code
import simd

/// A struct that encapsulates the essential 3D transformations: translation, rotation, and scaling.
struct AffineTransform {
  /// Translation vector in 3D space
  var translation: SIMD3<Float> = .zero

  /// Rotation quaternion (angle: 0.0, axis: zero vector by default)
  var rotation: simd_quatf = simd_quatf(angle: 0.0, axis: SIMD3<Float>(0,0,0))

  /// Scale factors for each axis (uniform scale of 1.0 by default)
  var scale: SIMD3<Float> = SIMD3<Float>(repeating: 1)

  /// Computed 4x4 model matrix combining all transformations
  ///
  /// The matrix is built by multiplying translation, rotation, and scale matrices
  /// in the order: Translation * Rotation * Scale
  var modelMatrix: float4x4 {
    Self.makeTranslate(translation) *
    float4x4(rotation) *
    Self.makeScale(scale)
  }

  /// Creates a 4x4 translation matrix from a 3D vector
  /// - Parameter vector: The translation vector
  /// - Returns: A 4x4 translation matrix
  static func makeTranslate(_ vector: SIMD3<Float>) -> float4x4 {
    let baseX: SIMD4<Float> = [1, 0, 0, 0]
    let baseY: SIMD4<Float> = [0, 1, 0, 0]
    let baseZ: SIMD4<Float> = [0, 0, 1, 0]
    let baseW: SIMD4<Float> = [vector.x, vector.y, vector.z, 1]

    return float4x4(baseX, baseY, baseZ, baseW)
  }

  /// Creates a 4x4 scale matrix from a 3D vector
  /// - Parameter vector: The scale factors for each axis
  /// - Returns: A 4x4 scale matrix
  static func makeScale(_ vector: SIMD3<Float>) -> float4x4 {
    float4x4(diagonal: SIMD4<Float>(vector.x, vector.y, vector.z, 1.0))
  }
}

Renderer Implementation

Now we are ready to display the 3D object.

For the Renderer we will start from the basics. We will need to initialize the device, command queue, and object parser. It might be not the best approach to keep everything inside the initializer since each of these components relatively heavy initialization logic on their own. But for the sake of simplicity we will keep it this way.

Notice that we’re using a predefined resource URL for the model. The specific model I used is available here.

Code
final class Renderer: NSObject {
  let library: ShaderLibrary

  let device: any MTLDevice = MTLCreateSystemDefaultDevice()!
  let commandQueue: any MTLCommandQueue

  let mdlObject: ObjectParser

  private var instanceTransforms: [AffineTransform] = [
    AffineTransform(
      translation: SIMD3<Float>(0.0, -10.0, 0.0),
      scale: SIMD3<Float>(repeating: 0.6)
    )
  ]
  private let instanceBuffer: MTLBuffer

  init(
    modelURL: URL = Bundle.main.url(
      forResource: "12973_anemone_flower_v1_l2",
      withExtension: "obj"
    )!
  ) {
    library = .init(
      library: try! device.makeDefaultLibrary(bundle: .main)
    )

    commandQueue = device.makeCommandQueue()!

    mdlObject = ObjectParser(
      modelURL: modelURL,
      device: device
    )

    instanceBuffer = device.makeBuffer(
      length: MemoryLayout<float4x4>.stride * instanceTransforms.count,
      options: []
    )!

    super.init()
  }
}

And add a basic MTKViewDelegate implementation. In this initial setup, we use the view’s render pass as a render target, so the model is rendered directly to the screen.

Code
extension Renderer: MTKViewDelegate {
  func draw(in view: MTKView) {
    guard
      let drawable = view.currentDrawable,
      let commandQueue = device.makeCommandQueue(),
      let commandBuffer = commandQueue.makeCommandBuffer()
    else {
      return
    }

    if
      let sceneRenderPassDescriptor = view.currentRenderPassDescriptor,
      let renderEncoder = commandBuffer.makeRenderCommandEncoder(
        descriptor: sceneRenderPassDescriptor
      )
    {

      do {
        try drawModel(in: view, renderEncoder: renderEncoder)
      } catch {
        fatalError(error.localizedDescription)
      }

      renderEncoder.endEncoding()
    }

    commandBuffer.present(drawable)
    commandBuffer.commit()
  }

  func mtkView(
    _ view: MTKView,
    drawableSizeWillChange size: CGSize
  ) {}
}

To draw the model we need to define a pipeline state that holds information about required shaders and data format. To define the vertex data we reuse the MDLVertexDescriptor from the object parser.

Code
extension Renderer {
  func drawModel(
    in view: MTKView,
    renderEncoder: any MTLRenderCommandEncoder
  ) throws {
    let pipelineDescriptor = MTLRenderPipelineDescriptor()
    pipelineDescriptor.vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(mdlObject.mdlVertexDescriptor)
    pipelineDescriptor.vertexFunction = try! library.modelVertex
    pipelineDescriptor.fragmentFunction = try! library.modelFragment
    pipelineDescriptor.colorAttachments[0].pixelFormat = view.colorPixelFormat
    pipelineDescriptor.depthAttachmentPixelFormat = .depth32Float

    let renderPipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)

    renderEncoder.setRenderPipelineState(renderPipelineState)

    let depthStencilDescriptor = MTLDepthStencilDescriptor()
    depthStencilDescriptor.depthCompareFunction = .less
    depthStencilDescriptor.isDepthWriteEnabled = true

    if
      let depthStencilState = device.makeDepthStencilState(
        descriptor: depthStencilDescriptor
      )
    {
      renderEncoder.setDepthStencilState(depthStencilState)
    }

    renderEncoder.setVertexBuffer(
      mdlObject.mesh.vertexBuffers[0].buffer,
      offset: mdlObject.mesh.vertexBuffers[0].offset,
      index: 0
    )
  }
}

Next, we configure the matrix that projects 3D coordinates onto the 2D screen. This involves creating matrices that transform models through various coordinate spaces and ultimately clip them to fit the visible area. We’ll explore this process in detail, including an interactive demonstration, later in the article.

Code
func drawModel(
  in view: MTKView,
  renderEncoder: any MTLRenderCommandEncoder
) throws {
  ...

  let aspect = Float(view.drawableSize.width / max(1, view.drawableSize.height))
  let perspectiveMatrix = AffineTransform.perspective(
    fovyRadians: .pi / 4,
    aspect: aspect,
    near: 0.1,
    far: 1000
  )
  let viewMatrix = AffineTransform.lookAt(
    eye: SIMD3<Float>(0.0, 0.0, 40.0),
    center: SIMD3<Float>(0.0, 0.0, 0.0),
    up: SIMD3<Float>(0.0, 1.0, 0.0),
  )

  var uniforms: SceneUniforms = .init(
    projection: perspectiveMatrix * viewMatrix
  )

  renderEncoder.setVertexBytes(
    &uniforms,
    length: MemoryLayout<SceneUniforms>.stride,
    index: 1
  )

  let ptr = instanceBuffer
    .contents()
    .bindMemory(
      to: float4x4.self,
      capacity: instanceTransforms.count
    )

  for (i, t) in instanceTransforms.enumerated() {
    ptr[i] = t.modelMatrix
  }

  renderEncoder.setVertexBuffer(
    instanceBuffer,
    offset: 0,
    index: 2
  )
}

Also implement the necessary 3D math primitives required for rendering.

Code
struct SceneUniforms {
  var projection: simd_float4x4
}

extension AffineTransform {
  static func perspective(
    fovyRadians: Float,
    aspect: Float,
    near: Float,
    far: Float
  ) -> float4x4 {
    let yScale = 1 / tan(fovyRadians * 0.5)
    let xScale = yScale / aspect
    let zRange = far - near
    let zScale = far / zRange
    let wz = -near * zScale

    return float4x4(
      SIMD4<Float>( xScale,   0,       0,   0 ),
      SIMD4<Float>(      0, yScale,    0,   0 ),
      SIMD4<Float>(      0,      0, zScale, 1 ),
      SIMD4<Float>(      0,      0,   wz,   0 ),
    )
  }

  static func lookAt(
    eye: SIMD3<Float>,
    center: SIMD3<Float>,
    up: SIMD3<Float>
  ) -> float4x4 {
    let zAxis = normalize(center - eye)
    let xAxis = normalize(cross(up, zAxis))
    let yAxis = cross(zAxis, xAxis)
    let translation = SIMD3<Float>(
      -dot(xAxis, eye),
       -dot(yAxis, eye),
       -dot(zAxis, eye)
    )
    return float4x4(
      SIMD4<Float>(xAxis.x, yAxis.x, zAxis.x, 0),
      SIMD4<Float>(xAxis.y, yAxis.y, zAxis.y, 0),
      SIMD4<Float>(xAxis.z, yAxis.z, zAxis.z, 0),
      SIMD4<Float>(translation.x, translation.y, translation.z, 1)
    )
  }
}

One last step before we jump into the shader code. We need to set up texture sampling and configure the model drawing process.

Code
func drawModel(
  in view: MTKView,
  renderEncoder: any MTLRenderCommandEncoder
) throws {
  ...

  let sampDesc = MTLSamplerDescriptor()
  sampDesc.minFilter = .linear
  sampDesc.magFilter = .linear
  sampDesc.sAddressMode = .repeat
  sampDesc.tAddressMode = .repeat

  let sampler = device.makeSamplerState(descriptor: sampDesc)
  renderEncoder.setFragmentSamplerState(sampler, index: 0)

  for (submesh, texture) in zip(mdlObject.submeshes, mdlObject.textures) {
    renderEncoder.setFragmentTexture(texture, index: 0)

    renderEncoder.drawIndexedPrimitives(
      type: submesh.primitiveType,
      indexCount: submesh.indexCount,
      indexType: submesh.indexType,
      indexBuffer: submesh.indexBuffer.buffer,
      indexBufferOffset: submesh.indexBuffer.offset,
      instanceCount: instanceTransforms.count
    )
  }
}

In the shader code we need to reflect the structure of the vertex data and the uniforms. The fragment function just samples the texture and returns the color. The vertex function builds the final position by multiplying the model matrix by the position.

Code
#include <metal_stdlib>
using namespace metal;

struct SceneUniforms {
  float4x4 projection;
};

struct VertexIn {
  float3 position [[attribute(0)]];
  float3 normal   [[attribute(1)]];
  float2 uv       [[attribute(2)]];
};

struct VertexOut {
  float4 position [[position]];
  float3 worldPosition;
  float3 normal;
  float2 uv;
};

vertex VertexOut modelVertex(
  VertexIn in                       [[stage_in]],
  constant SceneUniforms &u         [[buffer(1)]],
  constant float4x4 *instanceModels [[buffer(2)]],
  uint instanceID                   [[instance_id]]
) {
  // Get the model matrix for this instance
  float4x4 model = instanceModels[instanceID];

  VertexOut out;

  float4 worldPos = model * float4(in.position, 1.0);
  out.worldPosition = worldPos.xyz;

  // Transform vertex position from model space to clip space
  // Order: Model -> World -> View -> Projection
  out.position = u.projection * worldPos;

  // Transform the normal vector by the model matrix
  out.normal = (model * float4(in.normal, 0.0)).xyz;

  // Pass through UV coordinates unchaged
  out.uv = in.uv;

  return out;
}

fragment float4 modelFragment(
  VertexOut in [[stage_in]],
  texture2d<float> tex [[texture(0)]],
  sampler samp [[sampler(0)]]
) {
  const float2 uv = in.uv;
  return tex.sample(samp, uv);
}

Voila! We have a working renderer that displays the 3D object.

3D Flower Render

How the 3D math works

The Model-View-Projection (MVP) matrix is the classic approach in 3D math that transforms vertices through three coordinate spaces to get them from 3D world coordinates to 2D screen coordinates. There are other approaches to do this, but I haven’t studied them yet so we will stick with this one.

When we are talking about Space, we mean the coordinate system we work in.

1. Model Matrix

The Model matrix transforms vertices from Model Space (object local space) to World Space (where everything exists in the scene). We’ve defined the value inside the instanceTransforms array to allow each instance of the object to have a different transformation.

Code
AffineTransform(
  translation: SIMD3<Float>(0.0, -10.0, 0.0),
  scale: SIMD3<Float>(repeating: 0.6)
)

Using this matrix we can position, rotate, and scale the object. You can return to the affine transformations visualization to play with it.

Basically by defining the model matrix we tell how we want to place the object in the world where other objects might also be placed. And based on this, we’re not actually moving the object itself, but rather the space around it.

x y x y position: (0, 0) scale: (1, 1) rotation: 0deg position: (550, 120) scale: (1.5, 1.5) rotation: 30deg Model Matrix T x R x S World Space Model Space

2. View Matrix

The View matrix transforms from World Space to Camera Space. It’s like moving the camera around the world.

Code
let viewMatrix = AffineTransform.lookAt(
  eye: SIMD3<Float>(0.0, 0.0, 40.0),    // Camera position
  center: SIMD3<Float>(0.0, 0.0, 0.0),  // What we're looking at
  up: SIMD3<Float>(0.0, 1.0, 0.0)       // Up direction
)

By analogy to the model matrix, the view matrix is used to position the camera (I also like to think of it as the observer) in the world. And since there’s only one observer, all objects will be seen (if they are within the field of view) from a single point.

camera position: (300, 120) look at: (0, 0) up vector: (0, 1) World Space x y camera position: (0, 0) look direction: -X up vector: +Y Camera Space x y View Matrix camera

3. Projection Matrix

The Projection matrix transforms from Camera Space to Clip Space and maps the 3D scene to the 2D screen. This creates the perspective effect where distant objects appear smaller.

Code
let perspectiveMatrix = AffineTransform.perspective(
  fovyRadians: .pi / 4,  // 45° field of view
  aspect: aspect,        // Screen aspect ratio
  near: 0.1,             // Near clipping plane
  far: 1000              // Far clipping plane
)

This creates the “frustum” - a pyramid-shaped viewing volume. Everything inside gets rendered, everything outside gets clipped away.

field of view: 45deg near plane: z=0.1 far plane: z=1000 X: -1 to +1 Y: -1 to +1 Z: -1 to +1 Camera Space (3D) Clip Space (Normalized) x y z near plane far plane Perspective Matrix

Why This Works

Each transformation serves a specific purpose in the 3D pipeline:

  • Model: “Where is this object in the world?”
  • View: “Where is the camera looking?”
  • Projection: “How should perspective work?”

By combining them into a single matrix, we can transform any vertex with just one matrix multiplication in the shader:

Code
out.position = u.projection * model * float4(in.position, 1.0);
Code
var uniforms: SceneUniforms = .init(
  mvp: perspectiveMatrix * viewMatrix
)

Same as with affine transformations, the order is crucial.

  1. First: modelMatrix (model space → world space)
  2. Then: viewMatrix (world space → camera space)
  3. Finally: perspectiveMatrix (camera space → clip space)

This is exactly what happens in our modelVertex shader. The vertex starts in model space, gets transformed through world space, camera space, and finally clip space.

Here is a visualization to help better understand the camera space and perspective transformation.

3.000
3.000
3.000
0.000
0.000
0.000
0.000
1.000
0.000
75.0
1.500
0.100
20.000

For a deeper dive into the mathematical principles, check out the OpenGL Tutorial on Matrices, which explains these concepts in detail.

Step 1.1: Lighting & Diffuse

Now we are going to add support for a lighting model to make the image look more lively and natural. We’ll use a point light as the basis for our implementation. A point light is a light source that emits light in all directions from a single point in space. We can control its position, color, intensity, and attenuation (how quickly the light fades with distance).

First of all we need to define how the surface reflects the light. There are many different models, but we’ll use the Lambertian model for this example. Lambertian reflection is a type of diffuse reflection where the surface reflects light in a way that is proportional to the cosine of the angle between the surface normal and the light direction. Low angles between the surface normal and the light direction will result in a brighter reflection and vice versa.

n1 n2 n3 l3 l2 l1 a3 a2 a1

Light direction depends on the lighting model. We choose to use the point light model, so the light direction is the direction from the light source to the point on the surface.

Try moving the light around to see how the diffuse lighting changes. Notice how the sphere appears brighter when the light is closer and dimmer when it’s farther away.

5.000
5.000
5.000
40.000
10.000

Create a new struct to store the point light properties.

Code
struct PointLight {
  var position: SIMD3<Float>
  var color: SIMD3<Float>
  var intensity: Float
  var attenuation: Float
}

Instantiate the point light with some initial values. This instance will be sent to the fragment shader as a uniform.

Code
func drawModel(
  in view: MTKView,
  renderEncoder: any MTLRenderCommandEncoder
) throws {
  ...

  var pointLight = PointLight(
    position: SIMD3<Float>(0.0, 0.0, 10.0),
    color: SIMD3<Float>(1.0, 1.0, 1.0),
    intensity: 16.0,
    attenuation: 0.1
  )

  let sampDesc = MTLSamplerDescriptor()
  sampDesc.minFilter = .linear
  sampDesc.magFilter = .linear
  sampDesc.sAddressMode = .repeat
  sampDesc.tAddressMode = .repeat

  let sampler = device.makeSamplerState(descriptor: sampDesc)
  renderEncoder.setFragmentSamplerState(sampler, index: 0)

  renderEncoder.setFragmentBytes(
    &pointLight,
    length: MemoryLayout<PointLight>.stride,
    index: 1
  )

  for (submesh, texture) in zip(mdlObject.submeshes, mdlObject.textures) {
    renderEncoder.setFragmentTexture(texture, index: 0)

    renderEncoder.drawIndexedPrimitives(
      type: submesh.primitiveType,
      indexCount: submesh.indexCount,
      indexType: submesh.indexType,
      indexBuffer: submesh.indexBuffer.buffer,
      indexBufferOffset: submesh.indexBuffer.offset,
      instanceCount: instanceTransforms.count
    )
  }
}

As described above, we need to calculate the diffuse lighting in the fragment shader. Here we find the light direction and calculate the diffuse lighting based on it. Note that here the light vector is directed towards the light source, not the other way around. Attenuation parameter controls how fast the light intensity decreases with distance. In physically-based lighting it’s calculated using the inverse square law. You can read more about it here and here.

Code
struct PointLight {
  float3 position;
  float3 color;
  float intensity;
  float attenuation;
};

fragment float4 modelFragment(
  VertexOut in [[stage_in]],
  constant PointLight &light [[buffer(1)]],
  texture2d<float> tex [[texture(0)]],
  sampler samp [[sampler(0)]]
) {
  // Sample the base texture color
  float3 color = tex.sample(samp, in.uv).rgb;

  // Calculate lighting
  float3 normal = normalize(in.normal);
  float3 lightDir = light.position - in.worldPosition;
  float distance = length(lightDir);
  lightDir = normalize(lightDir);

  // Calculate attenuation (inverse square law with minimum distance)
  float attenuation = 1.0 / (1.0 + light.attenuation * distance * distance);

  // Calculate diffuse lighting (Lambertian)
  float NdotL = max(dot(normal, lightDir), 0.0);
  float3 diffuse = light.color * light.intensity * NdotL * attenuation;

  // Combine albedo with lighting
  float3 finalColor = color * diffuse;

  return float4(finalColor, 1.0);
}

And here is the result. If it looks a bit too dark, try changing the light position and intensity.

3D Flower with Lighting

Step 2: Offscreen Rendering

Offscreen rendering means that the rendered scene is not immediately displayed directly on the screen, but is stored in an intermediate texture. This way, it can be reused in several different stages of processing at once, and at the end, it can all be combined into a complete scene.

In our case it might be a bit overkill since we only have one effect to apply, but still I think it’s a good idea to know about it: the experiment we build here later can be expanded to include more effects and features.

scenetexturerenderapply effectspixelationblurcolor gradingdisplay

Creating an Offscreen Texture

First, we need to instantiate a texture that will serve as our render target for the 3D scene. Here we also need a separate depth texture to store depth values. Having a dedicated depth texture is essential for proper depth testing during offscreen rendering, since our drawing operations won’t go directly to the screen.

Because we’re rendering offscreen instead of directly to the screen (i.e. we don’t use currentRenderPassDescriptor from MTKView), we need to create a separate render pass that uses the texture we just created as the render target.

Code
func draw(in view: MTKView) {
  guard
    let drawable = view.currentDrawable,
    let commandQueue = device.makeCommandQueue(),
    let commandBuffer = commandQueue.makeCommandBuffer()
  else {
    return
  }

  let width = max(1, Int(view.drawableSize.width))
  let height = max(1, Int(view.drawableSize.height))

  // ---- START: MODEL ----

  let modelTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
    pixelFormat: view.colorPixelFormat,
    width: width,
    height: height,
    mipmapped: false
  )
  modelTextureDescriptor.usage = [.renderTarget, .shaderRead]
  modelTextureDescriptor.storageMode = .private
  modelTextureDescriptor.textureType = .type2D

  let modelTexture = device.makeTexture(descriptor: modelTextureDescriptor)

  let modelDepthDescriptor = MTLTextureDescriptor.texture2DDescriptor(
    pixelFormat: .depth32Float,
    width: width,
    height: height,
    mipmapped: false
  )
  modelDepthDescriptor.usage = [.renderTarget]
  modelDepthDescriptor.storageMode = .private

  let modelDepthTexture = device.makeTexture(descriptor: modelDepthDescriptor)

  let offscreenPassDesctiptor = MTLRenderPassDescriptor()
  offscreenPassDesctiptor.colorAttachments[0].texture = modelTexture
  offscreenPassDesctiptor.colorAttachments[0].loadAction = .clear
  offscreenPassDesctiptor.colorAttachments[0].storeAction = .store
  offscreenPassDesctiptor.colorAttachments[0].clearColor = MTLClearColorMake(0, 0, 0, 1)
  offscreenPassDesctiptor.depthAttachment.texture = modelDepthTexture
  offscreenPassDesctiptor.depthAttachment.loadAction = .clear
  offscreenPassDesctiptor.depthAttachment.storeAction = .dontCare
  offscreenPassDesctiptor.depthAttachment.clearDepth = 1.0

  if
    let renderEncoder = commandBuffer.makeRenderCommandEncoder(
      descriptor: offscreenPassDesctiptor
    )
  {

    do {
      try drawModel(in: view, renderEncoder: renderEncoder)
    } catch {
      fatalError(error.localizedDescription)
    }

    renderEncoder.endEncoding()
  }

  // ---- END: MODEL ----
}

Displaying the Result

Next, we need to add a new render pass to display the result on the screen. Here we use the modelTexture we created earlier as the source texture, and currentRenderPassDescriptor to display the result on the screen.

This part is named as “blit” because what it does is basically taking the texture and displaying it on the screen. If there were other textures to display, we would need to add more complex logic to handle and compose them.

Code
func draw(in view: MTKView) {
  ...

  // ---- END: MODEL ----
  // ---- START: SCENE BLIT ----

  if
    let sceneRenderPassDescriptor = view.currentRenderPassDescriptor,
    let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: sceneRenderPassDescriptor),
    let texture = modelTexture
  {

    do {
      try drawBlit(
        it: view,
        renderEncoder: renderEncoder,
        sceneTexture: texture
      )
    } catch {
      fatalError(error.localizedDescription)
    }

    renderEncoder.endEncoding()
  }

  // ---- END: SCENE BLIT ----

  commandBuffer.present(drawable)
  commandBuffer.commit()
}

For this step, we use a dedicated pair of shader functions. We don’t define a vertex descriptor here because all necessary data will be provided by the vertex shader itself.

Code
func drawBlit(
  it view: MTKView,
  renderEncoder: any MTLRenderCommandEncoder,
  sceneTexture: any MTLTexture
) throws {
  let pipelineDescriptor = MTLRenderPipelineDescriptor()
  pipelineDescriptor.vertexFunction = try! library.blitVertex
  pipelineDescriptor.fragmentFunction = try! library.blitFragment
  pipelineDescriptor.colorAttachments[0].pixelFormat = view.colorPixelFormat
  pipelineDescriptor.depthAttachmentPixelFormat = view.depthStencilPixelFormat

  let pipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)

  renderEncoder.setRenderPipelineState(pipelineState)

  let samplerDescriptor = MTLSamplerDescriptor()
  samplerDescriptor.minFilter = .linear
  samplerDescriptor.magFilter = .linear
  samplerDescriptor.sAddressMode = .clampToEdge
  samplerDescriptor.tAddressMode = .clampToEdge

  let sampler = device.makeSamplerState(descriptor: samplerDescriptor)!

  renderEncoder.setFragmentTexture(sceneTexture, index: 0)
  renderEncoder.setFragmentSamplerState(sampler, index: 0)
  renderEncoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
}

To display the texture on the screen, we use a fullscreen triangle technique. It has constant values for the vertices and UVs, that’s why we don’t need to pass them as parameters and the vertex shader just outputs them directly.

(0, 0) (2, 0) (0, 2) (1, 0) (0, 1) (1, 1) screen UV Coordinate Mapping (-1, -1) (3, -1) (-1, 3) (1, -1) (-1, 1) (1, 1) screen Screen

This way we can avoid the overhead of managing a vertex buffer and the complexity of drawing a quad. You can read more about it and comparison to the full quad technique in this article.

Code
struct BlitVertexOut {
  float4 position [[position]];
  float2 uv;
};

vertex BlitVertexOut blitVertex(
  uint vid [[vertex_id]]
) {
  float2 pos[3] = { float2(-1, -1), float2(3, -1), float2(-1, 3) };
  float2 uv[3]  = { float2(0, 0),   float2(2, 0),   float2(0, 2)  };

  BlitVertexOut out;
  out.position = float4(pos[vid], 0, 1);
  out.uv = uv[vid];
  return out;
}

fragment float4 blitFragment(
  BlitVertexOut in [[stage_in]],
  texture2d<float> src [[texture(0)]],
  sampler samp [[sampler(0)]]
) {
  float2 uv = float2(in.uv.x, 1.0 - in.uv.y);
  return float4(src.sample(samp, uv).rgb, 1.0);
}

Here you should see no difference between two approaches.

Step 3: Post-Processing Pipeline

Now comes the fun part! We’re going to take our rendered 3D scene and apply a pixelation effect to it.

The idea is to create a third rendering pass that sits between our 3D scene and the final display. This pass takes our rendered texture as input, applies the pixelation effect, and outputs a new texture.

Setting Up the Post-Processing Pass

We need to create another offscreen texture to hold our post-processed result:

Code
func draw(in view: MTKView) {
  ...

  // ---- END: MODEL ----
  // ---- START: POST-PROCESS ----

  let postProcessTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
    pixelFormat: view.colorPixelFormat,
    width: width,
    height: height,
    mipmapped: false
  )
  postProcessTextureDescriptor.usage = [.renderTarget, .shaderRead]
  postProcessTextureDescriptor.storageMode = .private

  let postProcessTexture = device.makeTexture(descriptor: postProcessTextureDescriptor)

  let postProcessPassDescriptor = MTLRenderPassDescriptor()
  postProcessPassDescriptor.colorAttachments[0].texture = postProcessTexture
  postProcessPassDescriptor.colorAttachments[0].loadAction = .clear
  postProcessPassDescriptor.colorAttachments[0].storeAction = .store
  postProcessPassDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0, 0, 0, 1)

  if
    let renderEncoder = commandBuffer.makeRenderCommandEncoder(
      descriptor: postProcessPassDescriptor
    ),
    let texture = modelTexture
  {

    do {
      try drawQuants(
        in: view,
        renderEncoder: renderEncoder,
        texture: texture
      )
    } catch {
      fatalError(error.localizedDescription)
    }

    renderEncoder.endEncoding()
  }

  // ---- END: POST-PROCESS ----
  // ---- START: SCENE BLIT ----

  if
    let sceneRenderPassDescriptor = view.currentRenderPassDescriptor,
    let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: sceneRenderPassDescriptor),
    let texture = postProcessTexture // replace modelTexture with postProcessTexture
  { ... }

  ...
}

Notice how we’re now using postProcessTexture instead of modelTexture in the final blit operation. This means we’re displaying the post-processed result, not the original 3D scene.

The Pixelation Effect

The drawQuants function is where the pixelation magic happens. It takes our rendered 3D scene and applies a quantization effect to create those chunky, blocky pixels:

Code
extension Renderer {
  ...

  func drawQuants(
    in view: MTKView,
    renderEncoder: any MTLRenderCommandEncoder,
    texture: any MTLTexture
  ) throws {
    struct PPUniforms {
      var viewportSize: SIMD2<Float>
      var pixelSize: Float
      var lineThickness: Float
      var gridColor: SIMD3<Float>
      var gridAlpha: Float
    }

    let samplerDescriptor = MTLSamplerDescriptor()
    samplerDescriptor.minFilter = .nearest
    samplerDescriptor.magFilter = .nearest
    samplerDescriptor.sAddressMode = .clampToEdge
    samplerDescriptor.tAddressMode = .clampToEdge

    let sampler = device.makeSamplerState(descriptor: samplerDescriptor)

    let pipelineDescriptor = MTLRenderPipelineDescriptor()
    pipelineDescriptor.vertexFunction = try! library.quantVertex
    pipelineDescriptor.fragmentFunction = try! library.quantFragment
    pipelineDescriptor.colorAttachments[0].pixelFormat = view.colorPixelFormat

    let renderPipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)

    let width = max(1, Int(view.drawableSize.width))
    let height = max(1, Int(view.drawableSize.height))

    var ppUniforms = PPUniforms(
      viewportSize: SIMD2<Float>(Float(width), Float(height)),
      pixelSize: 12.0,          // size of each pixel block in screen pixels
      lineThickness: 1.0,       // grid line thickness in pixels
      gridColor: SIMD3<Float>(0.1, 0.1, 0.1), // dark grid
      gridAlpha: 0.35           // grid opacity
    )

    renderEncoder.setRenderPipelineState(renderPipelineState)
    renderEncoder.setFragmentTexture(texture, index: 0)
    renderEncoder.setFragmentSamplerState(sampler, index: 0)
    renderEncoder.setFragmentBytes(&ppUniforms, length: MemoryLayout<PPUniforms>.stride, index: 0)

    renderEncoder.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: 3)
  }
}

Key Parameters

The PPUniforms struct controls the pixelation effect:

  • pixelSize: How big each pixel block should be (in screen pixels)
  • lineThickness: How thick the grid lines should be
  • gridColor: The color of the grid lines (dark gray in this case)
  • gridAlpha: How opaque the grid lines should be
12.0
1.000
0.350
0.100
0.100
0.100

The Shader Code

The pixelation effect is implemented in the fragment shader. Here’s how it works:

Code
struct PostProcessUniforms {
  float2 viewportSize;   // in pixels
  float  pixelSize;      // pixelation block size in pixels
  float  lineThickness;  // grid line thickness in pixels
  float3 gridColor;      // RGB for grid
  float  gridAlpha;      // alpha for grid overlay
};

struct PostProcessVertexOut {
  float4 position [[position]];
  float2 uv;
};

// Fullscreen triangle vertex shader
vertex PostProcessVertexOut quantVertex(
  uint vid [[vertex_id]]
) {
  PostProcessVertexOut out;

  float2 pos[3] = {
    float2(-1.0, -1.0),
    float2( 3.0, -1.0),
    float2(-1.0,  3.0)
  };
  float2 uv[3] = {
    float2(0.0, 0.0),
    float2(2.0, 0.0),
    float2(0.0, 2.0)
  };

  out.position = float4(pos[vid], 0.0, 1.0);
  out.uv = uv[vid];

  return out;
}

fragment float4 quantFragment(
  PostProcessVertexOut in [[stage_in]],
  constant PostProcessUniforms &u [[buffer(0)]],
  texture2d<float> colorTex [[texture(0)]],
  sampler samp [[sampler(0)]]
) {
  // Original Texture
  float2 texSize = u.viewportSize;
  float2 uv = float2(in.uv.x, 1.0 - in.uv.y);

  // Convert UV to pixel coordinates
  float2 px = uv * texSize;

  // Snap to grid
  float2 block = floor(px / u.pixelSize) * u.pixelSize + 0.5 * u.pixelSize;

  // Back to normalized UV coordinates
  float2 qUV = block / texSize;

  // Sample the scene color at the block center
  float3 base = colorTex.sample(samp, qUV).rgb;

  // Grid overlay: draw lines where we are close to the block edges
  float2 modv = fmod(px, u.pixelSize);
  float2 distToEdge = min(modv, u.pixelSize - modv);
  float edgeDist = min(distToEdge.x, distToEdge.y);

  float lineMask = smoothstep(u.lineThickness + 0.6, u.lineThickness, edgeDist);

  float3 gridRGB = u.gridColor;

  float3 colorWithGrid = mix(base, gridRGB, lineMask * u.gridAlpha);

  return float4(colorWithGrid, 1.0);
}

Yay! Now we have a Minecraft flower pixelated 3D flower. Drag the slider to compare the original and the pixelated version.

Image enhancement comparison - Before
Image enhancement comparison - After

How the Pixelation Algorithm Works

The magic happens in the fragment shader. Here’s the step-by-step process:

  1. Convert UV to pixels: float2 px = uv * texSize - Convert normalized UV coordinates to actual pixel coordinates

  2. Snap to grid: float2 block = floor(px / u.pixelSize) * u.pixelSize + 0.5 * u.pixelSize - This creates a grid where each block is pixelSize pixels wide

  3. Sample at block center: float2 qUV = block / texSize - Convert back to UV coordinates and sample the texture at the center of each block

  4. Add grid lines: The grid overlay code calculates how close each pixel is to the edge of its block and blends in grid lines using smoothstep for smooth edges

Pixelated ResultSnap to GridpxpxUV to PixelsOriginal Texture

To build the grid lines mask we first modulate the pixel coordinates by the pixel size to find:

  1. Relevant pixelated block coordinates (the one that contains the pixel we are processing)
  2. Coordinates of that pixel in the local space of the pixelated block.

Modulation means that we take the remainder of the division of the pixel coordinates by the pixel size. This gives us the coordinates of the pixel in the local space of the pixelated block.

Next we need to find distance from that pixel to the closest edge of the block. We do this with an assumption that:

  1. One of the pixel coordinates already has the shortest distance to the edge
  2. One on the coordinates of pixelSize - px.coords has the shortest distance to the edge

With this in mind all we need to do is to find the minimum of these four values - this is what edgeDist is.

And the last step is to blend the grid lines with the base color using the smoothstep function. If edgeDist is less than lineThickness we blend in the grid lines, otherwise we blend in the base color. smoothstep makes this blending look smooth and nice, but basically we can use any blending function we want.

Grid Line Mask Smallest Distance to Edge Modulo Calc (Block Local Coordinates) Pixel Block px origin px modv.x modv.y px dist.y dist.x thickness mask mask = 1 mask = 0

Let’s add final touches and make the model rotate. Define the animationTime and rotationSpeed properties that will store the animation progress.

Code
final class Renderer: NSObject {
  private var animationTime: Float = 0.0
  private var rotationSpeed: Float = 1.0

  ...
}

Animation time is updated with each draw call.

Code
func draw(in view: MTKView) {
  guard
    let drawable = view.currentDrawable,
    let commandQueue = device.makeCommandQueue(),
    let commandBuffer = commandQueue.makeCommandBuffer()
  else {
    return
  }

  animationTime += 0.016

  ...
}

Finally, we need to update the rotation component of the model matrix.

Code
func drawModel(
  in view: MTKView,
  renderEncoder: any MTLRenderCommandEncoder
) throws {
  ...

  for (i, t) in instanceTransforms.enumerated() {
    var _t = t
    _t.rotation = simd_quatf(
      angle: rotationSpeed * animationTime,
      axis: SIMD3<Float>(0, 1, 0)
    )
    ptr[i] = _t.modelMatrix
  }

  ...
}

I gotowe! Here is the final result. I imagine that it can be used as a background for some onboarding flow or something like that.

Conclusion

I hope you enjoyed this journey, because I sure did.

Of course, these are just the most basic things you can do in 3D. We also used a large number of MetalKit settings and components as is, without going into too much detail. And overall, the solution we built can be optimized a lot more (reusing textures, for example)

I will continue to explore this topic in the next articles. I’m really curious to see where I can go with all of this.

Here you can find the source code.

See you 🦄