Skip to main content

Monitoring - Servers status dashboard

October 1, 2018 · 3 min read

Description

This dashboard provides a status of the servers hosting the cluster and its datastores: CPU, RAM…

Screenshot

Content

Applicative nodes CPU usage (chart)

CPU usage of each node involved in the applicative cluster
Can be above 100% when multiple cores
Stability is key
Same usage on each nodes is preferred
Target is below 75% * number of cores

Applicative nodes free RAM (chart)

Free RAM usage of each node involved in the applicative cluster
This include the caching, so could be rather low
Stability is better
Same usage on each nodes is preferred

Services CPU usage (chart)

Sum of all CPU usage of all replicas for each service
Allow to find most demanding services easily and scale them
Allow to track weird behaviors
We can see that the most used ones are:
- PackWrite that receives and parse Packets from Whisperers
- WebWrite that aggregates packets of a TCP session to parse it
- PackRead that gives packets to Webwrite
- TcpUpdate that updates TCP sessions
- TcpWrite that receives TCP sessions from Whisperers

Services average RAM usage (chart)

Track the average RAM usage of all replicas for each service
Stability is the target
There is currently an issue with MonitorWrite memory. Yet to be fixed.

Redis CPU usage (chart)

Track the CPU usage of Redis databases instances
Nothing special to say... it is so small!
The number of instances and what they hosts is configurable
Here:
- Main: Tcp sessions, Http coms and Http pers
- Pack: Packets, Status, Whisp status, Customers and Whisperers

Redis used RAM (chart)

Tracks the memory usage of Redis
When the processing of pollers and parsing is too slow, Redis accumulates data and can reach its maximum (1GB for default)

Elasticsearch CPU usage (chart)

Tracks the CPU usage of Elasticsearch inside each Node of Elasticsearch cluster
Maximum at 100%
Stability is a key

Elasticsearch heap used (chart)

Track the JVM Heap used of Elasticsearch on each node
Should stay below the limit (each: 4GB - half the node memory)

Elasticsearch disk used (chart)

Track the disk used on each ES node
Should not reach the limit (here, 400 GB)

Description
Screenshot
Content