Skip to main content

Fewer moving parts - the back office rework behind the parsing pipeline

· 8 min read
Creator of Spider

The Go and Protobuf migration cut the parsing pipeline's CPU and memory bills in half. But it left the back office with a different kind of cost: the same plumbing copied across two parsers, a handful of small services doing one job each, and more deployments to operate than the work actually required.

This round of work is about that - not raw performance, but shape. Three changes, all aimed at the same goal: fewer moving parts, less duplicated code, and a foundation that makes the next parsers cheap to build.

The starting point

After the Go rewrite, the back office had two mature protocol parsers - one for HTTP, one for PostgreSQL - and they worked well. The problem was underneath them.

The two parsers shared something like three quarters of their code. Not by design, but by copy: session lifecycle, the data-access layer, TLS decryption, circuit breakers, statistics, the HTTP server skeleton. Every time one parser fixed a bug or tuned a behaviour in that shared territory, the other one quietly drifted further away. Two copies of the same idea is a maintenance tax that grows with every change.

Around the parsers sat extra deployments. Separate upload services took captured traffic from the agents and pushed it into Redis. And the poller - a single component, but deployed one instance per data type - drained those Redis queues into Elasticsearch, a row of near-idle pods each doing one small job. Simple individually, but together a lot of pods to run.

So the work split into three threads.

Back office before and after the rework

1. A shared foundation for parsers

The first and largest thread is a refactor: pulling the shared plumbing out of the two parsers and into a single library that both build on.

The motivation is concrete. We don't want to maintain two parsers - we want to be able to add a fourth and a fifth without paying the full cost each time. Two new protocol parsers are on the roadmap, for Redis and for gRPC traffic. If we build them by copy-pasting from the HTTP parser, we'd be creating the same drift problem at twice the scale.

Instead, everything generic - the bits that have nothing to do with HTTP or PostgreSQL specifically - moves into a shared foundation:

  • the session-parsing loop that claims work, fetches packets, and saves results;
  • the data-access layer that talks to Redis and Elasticsearch;
  • TLS decryption;
  • the service lifecycle: configuration polling, restarts, graceful shutdown;
  • circuit breakers, statistics, and the HTTP server skeleton.

Each parser keeps only what is genuinely specific to its protocol. The shared parts live in one place, get fixed once, and improve for everyone at the same time.

This is being done deliberately, in small reviewable steps rather than one big-bang rewrite - each step extracts one layer, both parsers move onto it together, and the end-to-end test suite confirms nothing changed in behaviour before moving on. It is slower to do it this way, but it keeps the two production parsers working the entire time. When the foundation is complete, the new Redis and gRPC parsers get built on top of it rather than next to it.

2. Folding the upload services into the parsers

The second thread removes a layer entirely.

Captured traffic arrives from the agents in two ways. Some of it is reconstructed and parsed live; some of it is uploaded in bulk. Until now, those bulk uploads went through their own dedicated services - one for HTTP, one for PostgreSQL - whose only job was to receive the data and write it into the same Redis caches the parser already writes to.

In other words, the upload service was a second door into a room the parser already owns. So we removed the door and gave the parser its own entrance. The parser now accepts uploads directly, reusing the exact same storage path it uses for everything else.

The result is one fewer service per protocol to deploy and operate, with no change visible to the agents sending the data - they simply point at the parser instead. We rolled this out one protocol at a time, letting the HTTP side soak in a real cluster before doing the same for PostgreSQL, so the safety net stayed in place throughout. Both upload services are now retired, with the end-to-end suite green against the parsers' own upload endpoints.

3. Running many pollings in one poller

The third thread is consolidation, and it is now complete. Unlike the first two, it is not about code at all - it is purely about resource usage.

Pollers move parsed data out of Redis and into Elasticsearch for long-term storage. This was already a single component - one codebase. The catch was how we ran it: the same poller was deployed many times over, one instance per queue, each draining a single data type (communications, parsing logs, statuses, TCP sessions, packets, hosts, and so on). Same binary, many pods, each pointed at a different queue.

That works, but every instance carries a fixed cost no matter how little it actually does: a process, a memory baseline, idle CPU, a connection pool. Multiply that overhead by every data type and it adds up to far more than the polling itself needs - most of those pods spent most of their time idle, paying rent.

So we taught the poller to run several pollings at once inside one process. Each polling stays fully independent - its own queue, its own schedule, its own circuit breaker, so if one stalls because its destination is unhealthy the others keep going - they just share a process instead of each claiming a whole pod. A handful of bundled deployments, grouped by the kind of data they handle, now do the work the whole fleet of single-purpose instances did before.

PollerPg.png

The goal here was narrow and deliberate: cut the CPU and memory the back office spends on this plumbing. The polling logic did not change - we just stopped paying the per-instance tax on every data type.

The monitoring keeps per-polling visibility, so we don't lose the ability to see exactly how much each data type is flowing - we just run far fewer pods to get that view.

We rolled this out the same way as the rest of the migration: one group of jobs at a time, each validated end-to-end against a real cluster before the previous deployment was retired, so data kept flowing into Elasticsearch the whole way through.

What the numbers said

The work was about shape, not speed. We watched the bills anyway, comparing a busy hour before the merge against an equally busy one after - including a morning peak carrying more than five times the HTTP load of the earlier sample.

The pollers tell the clearest story, because that is where whole processes disappeared. The two HTTP pollers together held about 400 MB of memory; the bundled poller that replaced them runs in 40 to 70 MB. Per communication drained, it spends about 40% less CPU. The other data types - PostgreSQL, TCP, packets, hosts - each shed 70 to 90% of their memory as their separate Node processes folded into shared Go ones.

The parser refactor was supposed to be performance-neutral - it only moved code around. It came out slightly ahead: per communication, the HTTP parser now spends roughly a third of the CPU it did before, helped by the shared loop batching work more aggressively. Wall-clock time to parse one communication stayed flat at about a millisecond. Redis cache footprints fell across the board.

MeasureBeforeAfter
HTTP poller memory~400 MB~40–70 MB
CPU per communication (poller)~1.5 ms~0.9 ms
CPU per communication (HTTP parser)~7.8 ms~2.4 ms

We did not set out to move these numbers. That they moved the right way is a bonus on top of the real win: fewer things to run.

Why now, and why this order

None of these three changes is about a single dramatic number. They're about the cost of running and extending the system rather than the cost of running a given request.

The Go migration proved the performance case. This work pays down the structural cost that came with it: it removes duplicated code, removes whole services, and - most importantly - turns the parser from a thing we copy into a thing we build on. The clearest measure of success won't be a CPU graph; it will be how quickly the next two parsers come together.

Closing

Three threads, one direction: a shared foundation under the parsers, the upload services folded in, and the pollers bundled down. Less to maintain, less to deploy, and a base that makes the next protocols cheap to add.

The same workflow as the migration before it - planned in small steps, validated against a real cluster, paced to value rather than ambition - is what kept the production parsers running while the ground moved underneath them.

Adding a new parser meant adding 3 services and 4 pollers.
Now we are down to 2 services and 1 poller configuration.

Nice 😉