From Node + JSON + REST to Go + Protobuf + gRPC - the parsing pipeline rewrite
The Controller rewrite in Go last February was a single agent.
The lessons learned there - that Claude could carry a Node service across to Go with feature parity over a handful of evenings, and that the runtime gains were real - made one thing obvious: the parsing pipeline was next.
This release ships the result:
- Seven services migrated.
- Three serialization formats swapped from JSON to Protobuf.
- Three hot paths swapped from REST to gRPC.
We divided the CPU usage by 2, and the memory usage by 4, to achieve the same performance with half the resources.
What was rewritten
The Spider parsing pipeline is the part of the back office that ingests captured packets, reconstructs TCP sessions, and decodes them into HTTP and PostgreSQL communications.
It runs on every byte the Whisperers send back.
Its CPU and Redis footprint show up directly in cluster bills.
The pre-migration baseline was straightforward:
- Node.js Koa services exchanging JSON over REST, storing JSON payloads in Redis.
- The hot path between parsers and the packet store (
Web-Write → Pack-Read) was a REST call returning a JSON array of packets per TCP session.
Seven services were rewritten in Go:
| Service | Role |
|---|---|
| Pack-Write | Stores raw packets from Whisperers into Redis |
| Pack-Read | Returns packets to parsers, by TCP session |
| Tcp-Write | Stores TCP session creations / updates |
| Tcp-Update | Updates TCP session state and parsing logs |
| Web-Write | Parses HTTP communications from TCP sessions |
| Pg-Parser | Parses PostgreSQL communications from TCP sessions |
| Tls-Keys-Linker | Matches captured TLS master secrets to TCP sessions |
Three serialization formats moved from JSON to Protobuf:
- Packets - the largest by volume, stored in
redis-pack - TCP sessions - the bookkeeping records that tie packets together
- HTTP communications and parsing logs - the decoded output
Two agents-service paths swapped from REST+JSON to REST+Protobuf:
Whisperer ↔ Pack-Write- save packets for parsingWhisperer ↔ Tcp-Write- save TCP session state to trigger parsing
Two inter-service hot paths swapped from REST+JSON to gRPC:
Web-Write ↔ Pack-Read- fetch packets to parseWeb-Write / Pg-Parser ↔ Tcp-Update- fetch parsing jobs and patch session state
The legacy Node.js versions of these services are still in the repo, kept around for reference and emergency rollback, but no longer deployed.
On the inter-service communications:
- Go + JSON is slower than Node+JSON but uses less CPU
- Go + GRPC on one side and Node+GRPC on the other is slower than Node+JSON equivalent CPU
- Go + GRPC on both sides is fastest, with half the CPU usage
Why this combination
The Controller rewrite already validated Go as the language and Claude as the co-author.
Two questions remained:
Why Protobuf?
Packets are the densest cargo in the system. They sit in redis-pack between capture and parsing - sometimes for seconds, sometimes for minutes if a parser falls behind. Each packet stored as JSON carried tens of percent of overhead in key names, string-encoded byte arrays, and whitespace. Protobuf strips that down to the bare bytes plus a few field tags.
Why gRPC for two specific hops?
Pack-Read is called by parsers tens of thousands of times per minute. The wire savings from Protobuf compound when paired with HTTP/2 multiplexing and a connection that stays warm. The other inter-service calls (config polling, status pushes, management APIs) remain on REST + JSON: their volume does not justify the migration cost, and JSON's debuggability is genuinely useful at low volume.
Workflow
The pattern that worked on the Controller worked again here, with one refinement.
Each service rewrite followed the same loop:
- Claude read the Node.js implementation and wrote an architecture document
- A phased plan was produced, with explicit feature checklists
- Implementation in Go, deployed locally, tested against the e2e suite
- Deployed to a real cluster, monitored, and iterated on what
pprofflagged
The refinement: protobuf schemas were defined once in a shared proto/ repository, generated for both Go (the new services) and JavaScript (the remaining Node consumers, and the back office tools that still need to read these records). This kept the legacy Node code on the network for the duration of the migration - one service could be swapped at a time without locking the others into a flag day.
The gRPC services were written behind the same authentication and circuit-breaker primitives the Spider stack already uses for REST: RS256 JWT, gobreaker-based per-target breakers, the same observability hooks feeding the monitoring. From the operator's point of view, a gRPC call surfaces in the dashboards exactly like a REST call.
Performance results
The reference comparison was a one-hour window on the sss-dev cluster, with spider-mon status snapshots before and after, against a Google-doc baseline captured on the pre-migration build.
Headline deltas
Load of cluster (night run)
| Type | Avg/min |
|---|---|
| Packets | 135,941 |
| Tcp | 15,251 |
| Http | 28,920 |
| Psql | 1,170 |
Metrics
| Metric | Baseline (Node/JSON/REST) | After (Go/gRPC/Protobuf) | Delta |
|---|---|---|---|
redis-pack memory | 330 MB | 80 MB |