Skip to content

Figma's FigCache: Next-Generation Data Caching at Scale

Source: Figma Blog
Author: Figma Engineering


TL;DR

As Redis became critical-path infrastructure at Figma, it hit scalability and reliability limits — connection bottlenecks, thundering herds, observability gaps, and data pollution risks. Figma built FigCache: a stateless, RESP-wire-protocol proxy acting as a unified Redis data plane. The architecture features a decoupled frontend/backend design, configuration-driven routing via Starlark, and custom extensions to the Redis protocol. Since its rollout to Figma's main API service in H2 2025, the caching layer has achieved six nines of uptime.


The Problem

As Figma grew, Redis evolved from a non-critical dependency to a critical-path component with structural challenges:

Issue Impact
Connection limits Clusters approaching hard limits
Thundering herds Rapid scale-ups caused massive connection bottlenecks
Data pollution No centralised traffic management — apps could corrupt data across clusters
Observability gaps Fragmented client libraries; slow incident diagnosis
Failover correctness No fleet-wide guarantees about state consistency

Initial mitigations (removing Redis from some API subsystems, localised client-side connection pooling) bought time but were not a strategic solution.


The Solution: FigCache Architecture

Why Build vs. Buy

Existing open-source proxies had critical shortcomings: - Could not extract full annotated arguments from arbitrary Redis commands (needed for semantic guardrails) - Could not extend the Redis protocol with custom commands - Required maintaining brittle source code forks to add business logic

Decoupled Proxy Architecture

FigCache separates two concerns:

Layer Responsibility
Frontend Client interaction, RESP-based RPC (ResPC), network I/O, connection management, structured command parsing
Backend Command processing, connection multiplexing to storage backends, physical execution

ResPC Framework — portmanteau of RESP + RPC — is a Go library providing an RPC framework over the Redis wire protocol.

Configuration-Driven Engine Tree

The backend layer is a dynamically-assembled tree of engine nodes:

  • Leaf nodes (Data Engines): Execute commands against Redis
  • Intermediate nodes (Filter Engines): Route, block, or modify commands before execution
  • Configuration expressed as Starlark programs evaluated at runtime, rendering a Protobuf-structured config

This enables complex behaviours (command-type splitting, key-prefix routing) without deploying new server binaries.

Key Features

Feature Description
Fanout Engine Transparently parallelises multi-shard pipelines, resolving CROSSSLOT errors
Custom Commands Language-agnostic distributed locking (over Redlock); protocol-native graceful draining
Fake Cluster Mode Emulation layer handling fragmented client configs for easier migration
Starlark Config Runtime-programmable routing without server rebuilds

Results

  • Six nines (99.9999%) uptime since rollout to Figma's main API service (H2 2025)
  • Centralised observability and traffic management
  • Dramatically reduced connection pressure on upstream Redis clusters

Key Takeaways

  1. When Redis becomes critical-path infrastructure, connection management and traffic isolation become existential problems
  2. A stateless proxy layer decouples client scaling from backend capacity, solving thundering herd and connection limit issues
  3. Configuration-driven (Starlark) architecture enables complex routing logic without deploying new binaries
  4. Custom protocol extensions (locking, draining) were only possible by building an in-house proxy — no open-source solution provided the needed extensibility
  5. Six nines is an exceptional reliability result for a caching infrastructure layer