Skip to main content

Testing payload compression in Redis

ยท 8 min read
Creator of Spider

An idea to limit memory usage of Redis.

Contextโ€‹

Redis is used as a memory store to store the parsing state during processing before serialization in Elasticsearch.
It stores mainly packets, TCP sessions and HTTP communications.

Those resources are quite big, and even with the streaming parsing, they may take up to several GB of RAM.

  • Packets are stored for 10s before parsing starts, and removed as soon as their associated communication part (request or response) is parsed completely.
  • TcpSessions are stored for 30s after each update, to keep the 'session'.
  • HTTP communications are stored only for the sync time with Elasticsearch.

Thus, the biggest data are stored the shortest time.

Ideaโ€‹

In my daily work, we faced OOM issues with Redis, and I suggested to compress the payload stored in it to limit the memory size required...

Guess what? I did not think of it for Spider!

So, let's try what it would give us! ๐Ÿ”

Requirementsโ€‹

  • Compression algorithm is standard and supported by default in Node Zlib. To make things easier.
  • It must be fast to sustain the load without incurring too most CPU load increase
  • It must effectively save some memory ๐Ÿ˜‰

How to put compressed data in Redisโ€‹

On first trials, sending a compressed buffer to Redis was not returning the same buffer when reading... ๐Ÿ˜…
Obviously an issue with string encoding and representation!

I asked GPT4, which informed me that indeed I needed to add the following parameter to either the Redis client (for all methods) or to the method called:

const redis = require('redis');

redis.createClient({
return_buffers: true
});

I wanted first to do a parameter by function, except... that it does not work for Lua scripts calls ๐Ÿ˜•
I then switched to having two different redis clients if compression is active!

Performance tests contextโ€‹

Loadโ€‹

  • Load sample in Flowbird Hub, used for testing:
DataLoad/sCompressions/sComment
Http communications7001 400Metadata + Content stored separately
Tcp sessions8002 4003 steps minimum: creation, parsing start, parsing end
Packets6 0006 000Stored / compressed only once
  • All in all, around 10 000 compressions / s minimum ๐Ÿ˜ฒ

Before activating compression for good, I needed to validate the effect on the CPU load!

Memory usageโ€‹

The table below describe an average of content in Redis during the parsing.

RedisContentSizeIO /s
Shared* 800 HTTP metadata
* 750 HTTP contents
* 700 HTTP parsing logs
165 MB4 500/s
Tcp* 23 000 Tcp310 MB19 000/s
Pack* 97 000 Packets800 MB14 000/s

We will test the compression effect step by step ๐Ÿ˜Š

Brotli trialโ€‹

HTTP communications metadataโ€‹

First, let's try Brotli compression (the most efficient, but slowest). On only HTTP communications metadata.

  • Memory usage of Redis went from 165MB to 128MB
    • This Redis stores HTTP metadata, HTTP content and HTTP parsing log.
  • CPU of parser went from 225% to ... 950% !! ๐Ÿ˜ฎ
    • For only 700 compressions /s !

Brotli.png

Way too expensive! Let's try with Gzip ๐Ÿ˜…

Gzip trialsโ€‹

HTTP communications metadataโ€‹

  • Memory usage of Redis went to 120MB ... smaller than Brotli !
  • CPU of parser went from 950% back to ... 265% !! ๐Ÿ’ช

Gzip is definitely better than Brotli for this use case.

Gzip.png

HTTP communications metadata + contentโ€‹

  • Memory usage of Redis went from 120MB to 117MB
  • CPU of parser went from 265% to 300%

GzipWithContent.png

Does not seem worth it.

TCP sessionsโ€‹

Not compressedGzip
Tcp-Write14% - 92MB44% - 96MB
Tcp-Update30% - 98MB76% - 95MB
Redis Tcp308MB119MB
  • Memory usage of Redis for TCP went from 310MB to 120MB ๐Ÿ’ช
  • CPU of tcp-write and tcp-update combined went from 44% to 120%! ๐Ÿ˜•

TcpGZipRedisSize.png

TcpGZipCPU.png

Packetsโ€‹

Not compressedGzip
Pack-Write42% - 89MB154% - 90MB
Pack-Read62% - 96MB88% - 98MB
Redis Pack419MB486MB

PacketsGZipCPU.png

  • High increase of CPU usage !
  • No memory gain !! ๐Ÿ˜…

It seems to confirm the little effect of compressing HTTP contents. Compressing base64 encoded payloads (that are mostly compressed) is not useful.

HTTP Parsing logโ€‹

Not compressedGzip
Web-Write152% - 108MB156% - 108MB
Redis Shared103MB96MB
  • Low increase of CPU usage (2.5%)
  • Small memory gain (7%)

Optimising Parsing Status storageโ€‹

When performing tests, I realised that Redis memory in Tcp and Shared instances was not taken by TCP and HTTP resources, but more by Parsing Status queues, that keep a track on all status for reporting.

I found a clever way to optimise this, and the memory dropped!
I needed then to redo tests of memory improvements, but... this was a nice finding!

Check this out below!

ParsingStatusOptimisation.png

Snappy trialsโ€‹

Well... That's when you realize you damn should read documentation before !

Redis documentation recommends snappy for compressing before storing! ๐Ÿ˜…
Let's try !

Packetsโ€‹

Not compressedSnappy
Pack-Write25/s - 72% - 112MB45/s - 65% - 113MB
Pack-Read810/s - 63% - 102MB730/s - 73% - 111MB
Redis Pack87k - 24% - 537MB76k - 25% - 551MB
Save Packets in Redis25/s - 12ms25/s - 15ms

Much better for CPU, but no space savings!

TCP sessionsโ€‹

Not compressedSnappy
Tcp-Write13/s - 17% - 95MB13/s - 25% - 116MB
Tcp-Update67/s - 36% - 99MB67/s - 52% - 110MB
Redis Tcp25k - 10% - 69MB25k - 18% - 69MB
Save Tcp in Redis96/s - 1ms95/s - 5ms

Much, much better in terms of CPU increase.
But no space savings.

HTTP communications metadata + content + parsing logโ€‹

Not compressedCompressed
Web-Write161% - 124MB168% - 122MB
HttpComPoller9% - 101MB10% - 106MB
HttpComContentPoller8% - 114MB9% - 106MB
Redis Shared104MB85MB

Low CPU increase, some space savings.

Side ideaโ€‹

As we can see that compressing at high frequency has an important CPU cost, I wondered if... avoiding compressing JSONs before sending over HTTP would reduce much CPU.

Modified services callsโ€‹

I then implemented flips to avoid compressing communications between:

  • WebWrite โ‡„ PackRead - GET /packets/of/tcpsession
    • 700 calls/s
    • response with n packets
  • WebWrite โ‡„ TcpUpdate - POST /parsing-jobs
    • 37 calls/s
    • response with โ‰ค 20 sessions
  • WebWrite โ‡„ TcpUpdate - PATCH /tcp-sessions
    • 37 calls/s
    • request with โ‰ค 20 sessions

Resultsโ€‹

To compare with and without compression, I compare the total CPU usage of services Web-Write + Tcp-Update + Pack-Read:

ApiCallsCompression.png

Average over a few hours:

With compressionWithout compression
Web-Write180% - 107MB170% - 105MB
Tcp-Update36% - 99MB78% - 87MB
Pack-Read111% - 94MB71% - 95MB
GET /packets/of/tcpsession32ms25ms
POST /parsing-jobs22ms20ms
PATCH /tcp-sessions24ms23ms

...

PackRead is better without compression, when TcpUpdate seems better with!

Let's then try the mix:

  • without GET /packets/of/tcpsession compression
  • with PATCH /tcp-sessions + POST /parsing-jobs compression.
Tcp-Update compression
Pack-Update non compression
CPU Web-Write149% - 111MB
CPU Tcp-Update34% - 100MB
CPU Pack-Read61% - 100MB
GET /packets/of/tcpsession800/s - 22ms
POST /parsing-jobs42/s - 15ms
PATCH /tcp-sessions42/s - 15ms

Conclusionโ€‹

CompressionInteresting
Packets in RedisโŒ
Tcp in RedisโŒ
HttpComs in Redisโœ…
HttpComContents in Redisโœ…
Http Parsing Logs in Redisโœ…
GET /packets/of/tcpsessionโŒ
POST /parsing-jobsโœ…
PATCH /tcp-sessionsโœ…

Even if due to high streaming throughput, the memory usage is not big, on spike it can grow much.
Compressing it will give us more breath, especially for HTTP communications resources.

However compressing and already compressed Base64 payload... does not bring anything ๐Ÿ˜…

As a bonus I found a way to limit memory usage for Parsing Statuses. ๐Ÿ˜

Cheers,
Thibaut