Servers status

Description

This dashboard provides a status of the servers hosting the cluster and its datastores: CPU, RAM, Disk…

Screenshot

Content

Applicative nodes CPU usage - timed graph

Shows the CPU usage of each node involved in the applicative cluster.

Can be above 100% when multiple cores
Stability is key
Same usage on each node is preferred
Target is below 75% * number of cores

When using Kubernetes, Spider would not be the only one running in the Cluster. This graphs shows the CPU used by all applications running, on all nodes.

Applicative nodes free RAM - timed graph

Shows the free RAM usage of each node involved in the applicative cluster.

This includes the operating system caching mechanism, so free RAM could be rather low
Stability is better
Same usage on each node is preferred

Servers items - aggregated grid

Shows major metrics for each cluster node, and their trend:

CPU
RAM
Disk (when on Swarm)
Count of warn and errors logs

This grid is for factual figures and mainly designed to export data.

Services CPU usage (sum of all replicas) - timed graph

Shows the CPU usage sum of all replicas for each service

Allows to find most demanding services easily and scale them
Allows to track weird behaviors

We can see that the most used ones are:

WebWrite that aggregates packets of a TCP session and parses it
PackRead that gives packets to WebWrite
PackWrite that receives and parse Packets from Whisperers
TcpUpdate that provides TCP sessions to parse and updates them after each parsing
TcpWrite that receives TCP sessions from Whisperers

Services average RAM usage - timed graph

Shows the average RAM usage of all replicas for each service

Most services are using between 90 and 150 MB of RAM per replica
Stability is the target

Dockers CPU usage (sum of all replicas) - timed graph

Shows the CPU usage sum of all Dockers for each service.

It provides a view on the cluster usage without limiting to services. You'll also find:

Elasticsearch
Redis
Traefik
Metricbeat
Filebeat

It allows to see any unexpected usage.

For instance, Filebeat is using a lot of CPU, because we have one for each node, and it tracks all logs, even not Spider's.

Dockers average RAM usage - timed graph

Shows the average RAM usage of all Dockers for each service.

Without surprise, Elasticsearch outranges all others.

Services items - aggregated grid

Shows major metrics for each service instance / pod, and their trend:

CPU
RAM
Count of warn and errors logs

This grid is for factual figures and mainly designed to export data.

Redis used RAM - timed graph

Shows the memory usage of Redis over time.

When the processing of pollers and parsing is too slow, Redis accumulates data and can reach its maximum (1GB for default)
The more stable the better.

ES nodes free RAM - timed graph

Show the free RAM for each Elasticsearch node / pod.

The lower, the better.
The more stable the better.

Redis CPU usage - timed graph

Shows the CPU usage of Redis databases instances.

Nothing special to say... it is so small!
Usually, I noticed a proportional ratio of 1% CPU for 1000 req/s load.
The number of instances and what they hosts is configurable

ES nodes CPU usage - timed graph

Shows the CPU usage of Elasticsearch inside each node / Pod of Elasticsearch cluster.

Maximum at 100% for each node.
Stability is a key

Elasticsearch heap used - timed graph

Shows the JVM Heap used of Elasticsearch on each node.

Should not reach the limit (configurable)

Elasticsearch disk used - timed graph

Shows the disk used on each ES node.

Should not reach the limit (configurable)

Description​

Screenshot​

Content​

Applicative nodes CPU usage - timed graph​

Applicative nodes free RAM - timed graph​

Servers items - aggregated grid​

Services CPU usage (sum of all replicas) - timed graph​

Services average RAM usage - timed graph​

Dockers CPU usage (sum of all replicas) - timed graph​

Dockers average RAM usage - timed graph​

Services items - aggregated grid​

Redis used RAM - timed graph​

ES nodes free RAM - timed graph​

Redis CPU usage - timed graph​

ES nodes CPU usage - timed graph​

Elasticsearch heap used - timed graph​

Elasticsearch disk used - timed graph​