Skip to main content

Ingest tool

The ingest tool is a script dedicated to duplicate network communications captured by a Whisperer to send them on another server / platform.

Use cases

Production migration tests

The tool has been used with success to validate a huge technical migration of a system in Flowbird.

We wanted to migrate a set of 100+ microservices from Docker Swarm to Kubernetes.
Because of some constraints on the domain name used in the data, the switch had to be done in one go, and could not be done progressively.

We used the Injest tool to do real life usage tests (functional and load) by injecting the Swarm production feed to the new Kubernetes production.

Having previously copied the reference data from one environment to another, and using the same security keys, the new environment was executing the same processing as production, with only 45s delay, and exactly the same load, while having the production feed transform to match the new Urls!

We were using Spider to compare the behavior of both systems with both Statistics bird eyes view, and deep down troubleshooting to understand the differences.

This was highly efficient and we saved days while conforting our confidence in the migration by fixing any difference in advance.

The inject tool injected effectively millions of requests to the system APIs, allowing us to perform the most massive regression test we could imagine!

Injest.png

New release regression tests with real requests

... (tell me!)

Overview

The ingest tool:

  1. Connects to Spider API
  2. Executes a query on HTTP communications resources
  3. Loop on all communications
  4. Transform communications requests based on Regular expressions rules
  5. Send the resulting transformed request
  6. Compare the response codes & bodies
info

Communications are sent at the speed they were captured. So that the target system is refreshed / updated in the same order as the original.

Which means that:

  • If you're replaying 2 hours of communications, it will take 2 hours
  • If the load in requests/s is too big for the Ingest tool... it might blow out of memory.

For finer comparison, best is to have Spider on the target server as well, then to compare statistically and in details for troubleshooting.

Technological stack

  • Node.js
  • Bundled with webpack in an Alpine container
  • Packaged as a Kubernetes Job
  • Published as a Helm chart

Packages

TypeRepoArtifact
Helmhttps://repository.floocus.com/helm/floocus/spider-ingest
Dockerregistry.gitlab.com/spider-analyzer/public-images/ingest-tool

Configuration

  • Credentials to connect to Spider API are passed in ENVIRONMENT variables.
  • Script configuration is read from a config.json file, in the same folder as the script.
  • In Helm, this file is included as YAML in the values file, then push as a ConfigMap.
config:
spiderBaseUri: https://spider.company.io
startDate: 2023-08-25T00:00:00.000Z # Start and end of the replay. If in the future, Spider will wait until the stopDate is pasted before stopping
stopDate: 2023-08-25T12:00:00.000Z
# Query to filter the communications to replay
query: "req.body.size:>0 AND _exists_:stats.statusCode AND NOT stats.src.origin:10.0.1.0/16"
whisperers:
- MvvEYhXATLqBJfUPszX4tA
pathsToIgnore: [] # Regexp matching req.uri field to ignore communications (but you may use the query too)
urlReplacementsPatterns: # Regexp matched to the rebuilt URL to do replacements
- match: legacy.company.com/prod
replace: app.company.io
queryReplacementsPatterns: # Regexp matched to the rebuilt querystring+hash to do replacements
- match: legacy.company.com/prod
replace: app.company.io
headersReplacementsPatterns: # Regexp matched to the headers to do replacements
- match: legacy.company.com/prod
replace: app.company.io
- match: legacy.company.com
replace: app.company.io
headersToIgnore: # Regexp matched to the headers to avoid sending those (if sensitive)
- x-forwarded-basepath
bodyReplacementsPatterns: # Regexp matched to the requests bodies to do replacements
- match: legacy.company.com/prod
replace: app.company.io
bodyValidationReplacementsPatterns: # Regexp matched to the original response bodies to do replacements
# prior to compare with new response
- match: legacy.company.com/prod
replace: app.company.io

email: yourSpiderAccount@yopmail.com # Credentials to connect to Spider API
password: MySecuredP@ssW0rd

logLevel: info # Sets log level (trace / debug / info / warn / error)

acceptSelfSignedCertificates: false # Accepts self signed certificate for Spider or Target server (true / false)

hostAliases: # Host aliases to set on the server
app.company.io: 10.25.63.2
spider.company.io: 10.25.63.4
server1.company.io, server2.company.io: 10.25.63.10 # If many servers with same IP

note

Since the request body is reassembled (and may be modified) by the script before sending,

  • Transfer-Encoding header is removed,
  • Content-Length header is recomputed

However, Content-Encoding header is kept: the request body is compressed back after transformation

Accessing repository

The Helm repository has a public access.

$ helm repo add floocus https://repository.floocus.com/helm
$ helm search repo floocus

Execution

$ helm upgrade spider-ingest -f ./values.yaml floocus/ingest --namespace ingest --create-namespace --install
  • It deploys as a Job and executes only once... until completion.
  • You may run it as many times as you want... or even in parallel
    • This could be useful if you would need scaling because one instance is not fast enough to ingest your load
    • Then run several in parallel with queries splitting your load

Logs

The script reports a progress status every 10s.

$ kubectl -n ingest logs spider-ingest-job-11-pw96p -f | bunyan

[2023-09-19T21:08:30.801Z] INFO: ingest-Tool/1 on spider-ingest-job-11-pw96p: You may find ingested data in Spider with this filter: "req.headers.x-spider-ingest-run:C9nCCIotQJuz53eYrn6i-w".
[2023-09-19T21:08:30.821Z] INFO: ingest-Tool/1 on spider-ingest-job-11-pw96p: Will fetch data to inject from 2023-09-18T19:42:00.000+00:00 to 2023-09-18T19:44:00.000+00:00
[2023-09-19T21:08:31.141Z] INFO: ingest-Tool/1 on spider-ingest-job-11-pw96p: Successfully connected to Spider at https://spider.hub.flowbird.cloud with thibaut.raballand+ingest@flowbird.group
[2023-09-19T21:08:31.142Z] DEBUG: ingest-Tool/1 on spider-ingest-job-11-pw96p: Token refresh job: Should refresh token in 0d 21h 35min 58s.
[2023-09-19T21:08:31.143Z] INFO: ingest-Tool/1 on spider-ingest-job-11-pw96p: Showing status every 10s.
[2023-09-19T21:08:31.277Z] DEBUG: ingest-Tool/1 on spider-ingest-job-11-pw96p: Page 1, items: 2, total items: 2
[2023-09-19T21:08:31.282Z] DEBUG: ingest-Tool/1 on spider-ingest-job-11-pw96p: Last com to send is in 60s. Pausing until then.
[2023-09-19T21:08:41.146Z] INFO: ingest-Tool/1 on spider-ingest-job-11-pw96p:
Requests sent: 1 at 0.1/sec
Matching statuses: 0%
Similarity of response bodies: 3.9%
Requests in pipe: 1.
Last sent was at: 2023-09-18T19:43:00.142+00:00

...
tip

The job is named with Helm revision as a suffix.
So that you may easily find its logs : spider-ingest-job-11-pw96p

Tracking ingested requests

tip

You may filter easily ingested requests with the filter provided by the script at start.
The ingest script adds a x-spider-ingest-run header to the requests, with a unique id as a value, generated at script start.

Completion

The script completes before sending the first request after the stopDate defined in configuration.

tip

To execute ingest in near real time, set the startDate to now and a stopDate in the future 👍

On completion or interrupt, it provides a complete status.

[2023-08-25T17:57:48.624Z]  INFO: ingest-Tool/1 on spider-ingest-job-58zgr:
Total requests sent: 48 at 1.6/sec
Matching statuses: 99%
Similarity of response bodies: 87.7%

Source code

Ingest Gitlab repository

History

  • Helm 0.5 / Docker latest
    • Added queryReplacementsPatterns to allow replacing inside queryString in an independent manner or the domain / url
    • Added security to have slight delay before pulling latest coms - 45s
  • Helm 0.4 / Docker latest
    • Prevent restart of job on error
  • Helm 0.3 / Docker latest
    • Added image pull policy
  • Helm 0.2 / Docker latest
    • Added hostAliases and acceptSelfSignedCertificates in values.yaml
    • Added x-spider-ingest-run header and Spider filter to track ingested requests
    • Fixed bugs (when body was empty or without a content-encoding)
    • Inverted bodyValidationReplacementsPatterns meaning (applies on original response now)
  • Helm 0.2 / Docker latest
    • Added bodyValidationReplacementsPatterns matched prior to response validation
    • Added Helm revision in Job's name to be able to launch it many times
  • Helm 0.1 / Docker 2023.08.26
    • First release