Skip to main content

Testing payload compression in Redis

· 7 min read
Creator of Spider

An idea to limit memory usage of Redis.

Context

Redis is used as a memory store to store the parsing state during processing before serialization in Elasticsearch.
It stores mainly packets, TCP sessions and HTTP communications.

Those resources are quite big, and even with the streaming parsing, they may take up to several GB of RAM.

  • Packets are stored for 10s before parsing starts, and removed as soon as their associated communication part (request or response) is parsed completely.
  • TcpSessions are stored for 30s after each update, to keep the 'session'.
  • HTTP communications are stored only for the sync time with Elasticsearch.

Thus, the biggest data are stored the shortest time.

Idea

In my daily work, we faced OOM issues with Redis, and I suggested to compress the payload stored in it to limit the memory size required...

Guess what? I did not think of it for Spider!

So, let's try what it would give us! 🔍

Requirements

  • Compression algorithm is standard and supported by default in Node Zlib. To make things easier.
  • It must be fast to sustain the load without incurring too most CPU load increase
  • It must effectively save some memory 😉

How to put compressed data in Redis

On first trials, sending a compressed buffer to Redis was not returning the same buffer when reading... 😅
Obviously an issue with string encoding and representation!

I asked GPT4, which informed me that indeed I needed to add the following parameter to either the Redis client (for all methods) or to the method called:

const redis = require('redis');

redis.createClient({
return_buffers: true
});

I wanted first to do a parameter by function, except... that it does not work for Lua scripts calls 😕
I then switched to having two different redis clients if compression is active!

Performance tests context

Load

  • Load sample in Flowbird Hub, used for testing:
DataLoad/sCompressions/sComment
Http communications7001 400Metadata + Content stored separately
Tcp sessions8002 4003 steps minimum: creation, parsing start, parsing end
Packets6 0006 000Stored / compressed only once
  • All in all, around 10 000 compressions / s minimum 😲

Before activating compression for good, I needed to validate the effect on the CPU load!

Memory usage

The table below describe an average of content in Redis during the parsing.

RedisContentSizeIO /s
Shared* 800 HTTP metadata
* 750 HTTP contents
* 700 HTTP parsing logs
165 MB4 500/s
Tcp* 23 000 Tcp310 MB19 000/s
Pack* 97 000 Packets800 MB14 000/s

We will test the compression effect step by step 😊

Brotli trial

HTTP communications metadata

First, let's try Brotli compression (the most efficient, but slowest). On only HTTP communications metadata.

  • Memory usage of Redis went from 165MB to 128MB
    • This Redis stores HTTP metadata, HTTP content and HTTP parsing log.
  • CPU of parser went from 225% to ... 950% !! 😮
    • For only 700 compressions /s !

Brotli.png

Way too expensive! Let's try with Gzip 😅

Gzip trials

HTTP communications metadata

  • Memory usage of Redis went to 120MB ... smaller than Brotli !
  • CPU of parser went from 950% back to ... 265% !! 💪

Gzip is definitely better than Brotli for this use case.

Gzip.png

HTTP communications metadata + content

  • Memory usage of Redis went from 120MB to 117MB
  • CPU of parser went from 265% to 300%

GzipWithContent.png

Does not seem worth it.

TCP sessions

Not compressedGzip
Tcp-Write14% - 92MB44% - 96MB
Tcp-Update30% - 98MB76% - 95MB
Redis Tcp308MB119MB
  • Memory usage of Redis for TCP went from 310MB to 120MB 💪
  • CPU of tcp-write and tcp-update combined went from 44% to 120%! 😕

TcpGZipRedisSize.png

TcpGZipCPU.png

Packets

Not compressedGzip
Pack-Write42% - 89MB154% - 90MB
Pack-Read62% - 96MB88% - 98MB
Redis Pack419MB486MB

PacketsGZipCPU.png

  • High increase of CPU usage !
  • No memory gain !! 😅

It seems to confirm the little effect of compressing HTTP contents. Compressing base64 encoded payloads (that are mostly compressed) is not useful.

HTTP Parsing log

Not compressedGzip
Web-Write152% - 108MB156% - 108MB
Redis Shared103MB96MB
  • Low increase of CPU usage (2.5%)
  • Small memory gain (7%)

Optimising Parsing Status storage

When performing tests, I realised that Redis memory in Tcp and Shared instances was not taken by TCP and HTTP resources, but more by Parsing Status queues, that keep a track on all status for reporting.

I found a clever way to optimise this, and the memory dropped!
I needed then to redo tests of memory improvements, but... this was a nice finding!

Check this out below!

ParsingStatusOptimisation.png

Snappy trials

Well... That's when you realize you damn should read documentation before !

Redis documentation recommends snappy for compressing before storing! 😅
Let's try !

Packets

Not compressedSnappy
Pack-Write25/s - 72% - 112MB45/s - 65% - 113MB
Pack-Read810/s - 63% - 102MB730/s - 73% - 111MB
Redis Pack87k - 24% - 537MB76k - 25% - 551MB
Save Packets in Redis25/s - 12ms25/s - 15ms

Much better for CPU, but no space savings!

TCP sessions

Not compressedSnappy
Tcp-Write13/s - 17% - 95MB13/s - 25% - 116MB
Tcp-Update67/s - 36% - 99MB67/s - 52% - 110MB
Redis Tcp25k - 10% - 69MB25k - 18% - 69MB
Save Tcp in Redis96/s - 1ms95/s - 5ms

Much, much better in terms of CPU increase.
But no space savings.

HTTP communications metadata + content + parsing log

Not compressedCompressed
Web-Write161% - 124MB168% - 122MB
HttpComPoller9% - 101MB10% - 106MB
HttpComContentPoller8% - 114MB9% - 106MB
Redis Shared104MB85MB

Low CPU increase, some space savings.

Side idea

As we can see that compressing at high frequency has an important CPU cost, I wondered if... avoiding compressing JSONs before sending over HTTP would reduce much CPU.

Modified services calls

I then implemented flips to avoid compressing communications between:

  • WebWrite ⇄ PackRead - GET /packets/of/tcpsession
    • 700 calls/s
    • response with n packets
  • WebWrite ⇄ TcpUpdate - POST /parsing-jobs
    • 37 calls/s
    • response with ≤ 20 sessions
  • WebWrite ⇄ TcpUpdate - PATCH /tcp-sessions
    • 37 calls/s
    • request with ≤ 20 sessions

Results

To compare with and without compression, I compare the total CPU usage of services Web-Write + Tcp-Update + Pack-Read:

ApiCallsCompression.png

Average over a few hours:

With compressionWithout compression
Web-Write180% - 107MB170% - 105MB
Tcp-Update36% - 99MB78% - 87MB
Pack-Read111% - 94MB71% - 95MB
GET /packets/of/tcpsession32ms25ms
POST /parsing-jobs22ms20ms
PATCH /tcp-sessions24ms23ms

...

PackRead is better without compression, when TcpUpdate seems better with!

Let's then try the mix:

  • without GET /packets/of/tcpsession compression
  • with PATCH /tcp-sessions + POST /parsing-jobs compression.
Tcp-Update compression
Pack-Update non compression
CPU Web-Write149% - 111MB
CPU Tcp-Update34% - 100MB
CPU Pack-Read61% - 100MB
GET /packets/of/tcpsession800/s - 22ms
POST /parsing-jobs42/s - 15ms
PATCH /tcp-sessions42/s - 15ms

Conclusion

CompressionInteresting
Packets in Redis
Tcp in Redis
HttpComs in Redis
HttpComContents in Redis
Http Parsing Logs in Redis
GET /packets/of/tcpsession
POST /parsing-jobs
PATCH /tcp-sessions

Even if due to high streaming throughput, the memory usage is not big, on spike it can grow much.
Compressing it will give us more breath, especially for HTTP communications resources.

However compressing and already compressed Base64 payload... does not bring anything 😅

As a bonus I found a way to limit memory usage for Parsing Statuses. 😁

Cheers,
Thibaut