Skip to main content

Command Palette

Search for a command to run...

From 6 GB to 2 GB: How We Tamed a Memory Beast in Node.js

A deep dive into how a single algorithmic decision silently inflated our app's memory — and the three-phase journey that brought it back under control.

Updated
•13 min read
From 6 GB to 2 GB: How We Tamed a Memory Beast in Node.js
K
Passionate learner and problem solver, sharing insights and lessons from the projects and challenges I tackle throughout my career šŸ¤“

A Number That Made Us Stop

During a routine load test, we watched our Node.js application climb to 6 GB of RAM at peak. No memory leak in the traditional sense. No runaway loop we forgot to close. Just... the app doing exactly what we built it to do — and eating memory for breakfast because of how we built it.

By the end of this story, peak memory sat at 2 GB. Bootstrap time dropped from **4 minutes to ~40 seconds**. CPU load dropped noticeably. GC pauses shortened. All without rewriting the business logic.

Here's every decision that got us there.


First, Let Me Explain the App's Anatomy

Before we get to the fix, you need to understand the problem space — because without that context, the solution looks like magic.

Our application manages a complex configuration system. Think of it as a mirror of the database: every table in our DB has a corresponding representation in the app. On top of that representation, we layer customization, filtering, access rules, and relationship logic.

These configurations don't live in isolation. They form a graph — nodes connected to other nodes, representing how database tables relate to each other. One configuration node can depend on three others. Those three can each depend on five more. And so on. It's a deep, wide, interconnected web.

We support 15+ data sources, each with many tables. Total nodes in the graph: 100+. And these data sources also reference each other, so the graph isn't just deep within one source — it spans across all of them.

This design is actually a win for the team: adding a new data schema or table is just a matter of writing a JSON config. No new code. The graph handles the rest. It's expressive, fast to iterate on, and easy for engineers to reason about in isolation.

The problem is what happens when you boot the whole thing up at once.


The Bootstrap Phase: Where It All Starts

When the app starts, it needs to do a significant amount of CPU-intensive work to make the graph usable:

  • Link all nodes together across data sources

  • Resolve dependency order — if Table A depends on Table B, B must be ready before A

  • Detect and handle circular dependencies — if A depends on B and B depends on A, we need a strategy, not an infinite loop

  • Validate existence — make sure every referenced table actually exists in the schema

  • Build traversal paths — so runtime queries can walk the graph efficiently

This is unavoidable work. We need to do it. The question is: how we do it.


The Root Cause: A DFS Algorithm With an Expensive Habit

To resolve dependencies and build the full graph, we used a Depth-First Search (DFS) algorithm. DFS is a natural fit here — you start at a root node, explore all its children, then their children, then backtrack. It handles circular dependency detection cleanly with a "visited" set.

The algorithm roughly worked like this:

resolve(node):
  if node is in visitedSet → skip (circular dep detected)
  mark node as visited
  
  for each child of node:
    resolvedChild = cloneDeep(child)   ← 🚨 here's the problem
    attach resolvedChild to node
    resolve(resolvedChild)

See that cloneDeep? That was the catastrophic habit.

Why We Were Using cloneDeep

The intent was reasonable. Each node in the graph can carry a different object shape depending on its relationship with its parent. Node B as a child of A might have different metadata attached than B as a child of D. If we just passed around the same reference, mutating the object for one parent would silently corrupt it for another.

So we reached for cloneDeep from Lodash to create a full independent copy of each node before attaching it.

What cloneDeep Actually Does Under the Hood

A deep clone is not a simple copy. It's a full recursive traversal of an object graph. Here's the mental model:

cloneDeep(obj):
  result = {}
  for each key in obj:
    if obj[key] is a primitive → result[key] = obj[key]
    if obj[key] is an object  → result[key] = cloneDeep(obj[key])  ← recursion
  return result

It's literally another DFS — inside each node — to recreate every nested value. For a small object, this is trivial. For a node in our graph that might carry nested configuration objects, metadata arrays, nested relation descriptors, and more? It becomes very expensive, very fast.

The Compounding Problem

Now consider that we were calling cloneDeep inside a DFS that traverses 100+ nodes, each with multiple children, which each have multiple children. Every level of the tree triggered deep clones at every level below it.

The result:

  • Memory exploded because we were holding thousands of redundant object copies simultaneously

  • The GC (garbage collector) had to work constantly to clean up intermediate clones that were no longer needed

  • Bootstrap took ~4 minutes because between the DFS traversal and the recursive cloning, the CPU was saturated and the event loop was effectively blocked

  • The larger our configs grew, the worse it got — this was a scaling time bomb


Fix #1 — Shallow Copy + Targeted Key Mutation

The key insight was: we don't need to clone the entire node. We just need to change a few specific keys on it before attaching it to a parent.

If we know exactly which keys differ per parent-child relationship, we can:

  1. Create a shallow copy of the node (O(1) — just copies top-level references)

  2. Override only the specific keys that need to differ

// Before: O(n) deep clone of the entire object graph per node
const resolvedChild = cloneDeep(child);

// After: O(1) shallow copy + targeted overrides
const resolvedChild = {
  ...child,               // reuse all references
  parentRef: currentNode, // only override what needs to differ
  relationMeta: computedMeta,
};

What Is a Shallow Copy?

A shallow copy creates a new top-level container but reuses references for all nested objects. In JavaScript, the spread operator { ...obj } does exactly this.

Original object:
  { name: "tableA", config: { → points to Config Object X } }

Shallow copy:
  { name: "tableA", config: { → still points to Config Object X } }
  ↑ new object, but inner config is shared, not duplicated

For our use case, this was completely safe. We weren't mutating nested objects — we were only replacing top-level keys. The shared references were fine to keep.

The impact was dramatic. What was previously a deep recursive clone for every node in a 100+ node graph became a single object spread per node. The memory footprint of the bootstrap phase dropped significantly, and — because we were no longer allocating thousands of temporary deep-clone objects — the GC had far less to clean up.

After this change: ~3.43 GB peak. Bootstrap: ~40 seconds.

Still room to improve. But already a massive win.


A Side Note on GC: Why Fewer Objects Matters

Node.js uses V8's garbage collector, which runs in two modes:

  • Minor GC (Scavenge): Fast, frequent. Cleans up short-lived objects in the "young generation" heap space. Runs in milliseconds.

  • Major GC (Mark-and-Sweep): Slow, occasional. Traverses the entire live object graph to find and free unreachable memory. Can pause the event loop for tens of milliseconds.

When you allocate thousands of cloneDeep objects that are only used briefly and then discarded, you're putting pressure on both GC modes. Minor GC has to scavenge constantly. Some objects "survive" long enough to get promoted to the old generation, then Major GC has to walk a much larger graph during mark-and-sweep.

By switching to shallow copies, we drastically reduced allocation volume. The GC now runs less frequently, less intensely, and with a smaller graph to traverse. That means more consistent latency, more headroom for the event loop to do real work, and a noticeably more stable CPU profile.


Fix #2 — WeakMap Instead of Map for Intermediate Caches

During the DFS traversal, we maintained a cache of intermediate resolution results — essentially memoizing nodes we'd already resolved so we wouldn't process them twice.

We were using a regular Map for this. That seems reasonable at first glance. But there's a subtle problem.

How Map Can Cause Memory Leaks

A regular Map holds strong references to its keys. As long as the Map exists, every key it holds is considered "reachable" by the GC — even if nothing else in the application references that key anymore.

In our case, the intermediate cache was a module-level variable that persisted for the life of the bootstrap process. Nodes that had been resolved and were no longer needed by any active code path were still being kept alive by the Map. The GC couldn't collect them.

// Strong reference — node is kept alive by the Map even if nothing else needs it
const cache = new Map();
cache.set(nodeObject, resolvedResult);
// nodeObject cannot be GC'd as long as cache exists

WeakMap: Hold Without Blocking Collection

A WeakMap holds weak references to its keys. If the only reference to a key object is the WeakMap itself, the GC is free to collect it. The entry disappears from the WeakMap automatically.

// Weak reference — node can be GC'd if nothing else references it
const cache = new WeakMap();
cache.set(nodeObject, resolvedResult);
// nodeObject CAN be GC'd when no other code holds a reference

For an intermediate processing cache — where objects are only needed during traversal and can be released once their subtree is resolved — WeakMap is the semantically correct choice. It lets the GC reclaim intermediate nodes progressively during the bootstrap, rather than holding onto everything until the entire process completes.

Important caveat: WeakMap only accepts objects as keys (not primitives), and you can't iterate over it or check its size. For a lookup cache keyed on node objects, it's a perfect fit. For anything else, evaluate carefully.


Fix #3 — node-caged: Pointer Compression at the Runtime Level

After the algorithmic fixes, we had one more tool to reach for: node-caged.

This is not a library. It's not a code change. It's Node.js — compiled from source with a single extra flag:

--experimental-enable-pointer-compression

You swap your Dockerfile base image, and that's it.

Why Node.js Uses So Much Memory for "Small" Objects

To understand why this helps, you need a quick detour into how modern computers address memory.

Modern servers use a 64-bit architecture. This means memory addresses are 64 bits long — 8 bytes per pointer. Every object in JavaScript (inside V8) carries many internal pointers:

  • A pointer to its prototype

  • A pointer to its property map (also called a "shape" or "hidden class")

  • Pointers to each property that holds an object value

  • Pointers to internal V8 bookkeeping structures

A small JavaScript object with a few properties might carry 10–20 internal pointers. At 8 bytes each, that's 80–160 bytes of overhead just for the pointers — before you've stored a single byte of your actual data.

Scale that across hundreds of thousands of live objects in a complex graph, and pointer overhead becomes a meaningful fraction of total heap size.

What Pointer Compression Does

V8's pointer compression (used in Chrome, now optionally available in Node.js via node-caged) works by defining a fixed "memory cage" — a contiguous block of heap memory with a known base address.

Instead of storing full 64-bit addresses, V8 stores 32-bit offsets from the base of the cage:

Without compression:
  prototype pointer: 0x00007FFF1234ABCD   ← 8 bytes

With compression (base = 0x00007FFF00000000):
  prototype pointer: 0x1234ABCD           ← 4 bytes (just the offset)

Half the bytes per pointer. Since pointers make up a large share of V8 heap usage, the real-world result is typically a 35–50% reduction in heap size — without touching a single line of your application code.

The tradeoff: the memory cage has a maximum size of 4 GB (because a 32-bit offset can only address 4 GB of range). For most services — especially microservices — this is plenty. For monolithic applications or services that genuinely need more than 4 GB of JS heap, this won't work. Know your limits before adopting it.


Metric before after
Peak memory (load) ~6 GB ~ 2 GB
Bootstrap time ~4 minutes ~40 seconds
GC pressure High (major GC frequent) Low
CPU during bootstrap Saturated Comfortable

Each fix contributed a distinct layer of improvement:

  • Shallow copy eliminated the core algorithmic waste — the biggest single win

  • WeakMap removed structural memory leaks in intermediate processing

  • node-caged applied a runtime-level compression that reduced heap overhead across the board


Lessons Learned

  1. Understand your tools before you scale with them. cloneDeep is correct and useful. But it's a full object graph traversal. Using it inside another traversal, on a large graph, at boot time, is a recipe for compounding cost. Know what your utilities do internally — not just what they promise in the docs.

  2. The cost of an algorithm is not just its Big-O — it's the constant factor too. Our DFS was O(n). The cloneDeep inside was also O(n) per node. The real complexity was O(n²) and we didn't see it until the graph grew large enough to make it visible.

  3. Shallow copy is almost always enough if you control mutation. The instinct to reach for cloneDeep often comes from fear of accidental mutation. But if you know which keys you're mutating and when, a shallow copy with targeted overrides is both safer to reason about and orders of magnitude cheaper.

  4. WeakMap is not just an optimization — it's the semantically correct choice for ephemeral caches. If your cache exists to support a computation (not to serve as a long-lived data store), your keys should not prevent GC. WeakMap encodes that intent directly in the data structure.

  5. Infra-level wins exist, and they're worth knowing about. node-caged required zero application code changes. It's just a different binary. Not every optimization lives in your business logic — sometimes it lives in how the runtime itself manages memory. Stay curious about the lower layers of your stack.

  6. Memory problems compound silently. We didn't wake up one day to 6 GB. The graph grew by tens of nodes over months, each addition adding proportionally more clones. The signal was subtle until it wasn't. If your memory grows roughly linearly with configuration complexity, ask yourself why — the answer might surprise you.


References :

1- Node caged

2- Weak map

3- Shallow Copy