HTTP body transcoding - making protobuf and MessagePack traffic searchable
Spider has been parsing HTTP/1.1 traffic for years - request lines, headers, status codes, response bodies - and let operators write tag and template rules that pull values out of those parts.
This works perfectly when the body is text.
JSON and XML carry their field names in plain sight, so a rule looking for a client id finds the string clientId sitting right there in the bytes.
Protobuf and MessagePack do not work that way: the body is a compact binary blob, and the field names live in a separate schema, not in the payload. The information is there - it just is not readable as text, so the text rules have nothing to grab onto.
The new HTTP body transcoding feature closes that gap.
- The operator uploads the schema once and declares which requests it applies to.
- From then on, Spider quietly converts each matching binary body into the equivalent JSON before the existing tag and template rules run.
- The same rules that already worked for JSON now also work for protobuf and MessagePack - with no change to the rules themselves.
What the operator sees
- Upload a schema bundle.
From the Whisperer parsing-config, in the new Content Schemas block, the operator uploads one or more.protofiles. Multiple files form a single bundle, so an entry-point schema that imports shared definitions is uploaded as one unit. MessagePack needs no schema (it describes itself), but a binding is still declared so Spider knows which traffic to convert.

- Bind it.
A binding row says: for this method, this URI, and this content-type, use this schema and message type - separately for the request and the response.
The message type is picked from a dropdown that Spider populated by reading the uploaded bundle, so there are no fully-qualified names to remember.

- Tag and template against the JSON form.
Existing rules that read the body now match against the converted JSON. Nothing else changes - the rules, the dashboards, and the search filters all see decoded values.

- Verify in the Playground.
The Playground shows the converted request and response, names which binding fired, and surfaces a clear error when something is off.

What the operator does not see
The converted JSON is intentionally never stored. The parser produces it, hands it to the rules, and throws it away. Two reasons:
-
Storage cost.
The same data expressed as JSON is typically 2–4× larger than the binary form. Keeping both would cost more disk for no extra benefit to the rules. -
Truth on the wire.
What was actually captured is the binary payload.
The JSON is only a view of it - derived from a schema, and re-derivable at any time:
Storing the view next to the source would invite the two to drift apart (re-upload a corrected schema, and the stored JSON is now stale).
So the JSON appears exactly where it earns its keep: inside the parser, just long enough to feed the rules, and inside the Playground, for the operator validating a draft binding.
How it works
The feature is built from four pieces: one shared conversion library and three components that use it.
- The shared library is the single place that knows how to turn a binary body into JSON. It handles protobuf (using the uploaded schema), MessagePack (self-describing, no schema needed), and plain JSON (passed through). Because every component uses the same library, they all agree on what "transcoded" means.
- The schema store keeps the uploaded bundles in one place and hands them out by id. Uploads are idempotent: the same bundle uploaded twice is stored once.
- The parser is where the value lands at capture time - it converts matching bodies on the fly so the rules can extract values.
- The UI is where the operator uploads schemas, declares bindings, and previews the result.
At capture time
When the parser handles a request, it asks one question: does this request's (method, URI, content-type) match a configured binding? If not, the body is left exactly as it was and the existing text rules run unchanged - the feature only ever adds parsing, it never removes a parse that was already happening.
If a binding does match, the body is converted to JSON and the rules run against that. The first time a given schema is needed, it is fetched from the schema store; after that it is cached, so a busy parser is not re-fetching or re-compiling the same schema on every request.
When conversion fails - a corrupt payload, a body that does not match its declared schema, a message type missing from the bundle - the body is dropped rather than handed to the rules as raw bytes.
Matching text rules against random binary would produce false positives, and a wrong extraction is worse than a missing one.
The failure is counted, and it shows up as a clear message in the Playground rather than as silently bad data.
At read time
The same bindings also power a view in the UI. When you open a captured communication whose body matches a binding, the Body tab gains a Transcoded option and shows the decoded JSON inline, right next to the raw bytes.


Much better, isn't it? 😎
The viewer uses the very same conversion library as the parser, just invoked on demand instead of during capture. One matcher, two consumers: the parser at write time, the UI at read time.
Going forward
Parse-time conversion was the harder half of the feature, and the read-time view followed it cheaply because both share the same library.
That shared library is the part that pays off over time: every component agrees on what "transcoded" means, so the next consumer of decoded bodies is a small addition rather than a rewrite.
Feedback and feature requests are welcome!