Skip to main content

Monitoring - Servers status dashboard

· 3 min read

Description​

This dashboard provides a status of the servers hosting the cluster and its datastores: CPU, RAM…

Screenshot​

Content​

Applicative nodes CPU usage (chart)​

  • CPU usage of each node involved in the applicative cluster
  • Can be above 100% when multiple cores
  • Stability is key
  • Same usage on each nodes is preferred
  • Target is below 75% * number of cores

Applicative nodes free RAM (chart)​

  • Free RAM usage of each node involved in the applicative cluster
  • This include the caching, so could be rather low
  • Stability is better
  • Same usage on each nodes is preferred

Services CPU usage (chart)​

  • Sum of all CPU usage of all replicas for each service
  • Allow to find most demanding services easily and scale them
  • Allow to track weird behaviors
  • We can see that the most used ones are:
    • PackWrite that receives and parse Packets from Whisperers
    • WebWrite that aggregates packets of a TCP session to parse it
    • PackRead that gives packets to Webwrite
    • TcpUpdate that updates TCP sessions
    • TcpWrite that receives TCP sessions from Whisperers

Services average RAM usage (chart)​

  • Track the average RAM usage of all replicas for each service
  • Stability is the target
  • There is currently an issue with MonitorWrite memory. Yet to be fixed.

Redis CPU usage (chart)​

  • Track the CPU usage of Redis databases instances
  • Nothing special to say... it is so small!
  • The number of instances and what they hosts is configurable
  • Here:
    • Main: Tcp sessions, Http coms and Http pers
    • Pack: Packets, Status, Whisp status, Customers and Whisperers

Redis used RAM (chart)​

  • Tracks the memory usage of Redis
  • When the processing of pollers and parsing is too slow, Redis accumulates data and can reach its maximum (1GB for default)

Elasticsearch CPU usage (chart)​

  • Tracks the CPU usage of Elasticsearch inside each Node of Elasticsearch cluster
  • Maximum at 100%
  • Stability is a key

Elasticsearch heap used (chart)​

  • Track the JVM Heap used  of Elasticsearch on each node
  • Should stay below the limit (each: 4GB - half the node memory)

Elasticsearch disk used (chart)​

  • Track the disk used on each ES node
  • Should not reach the limit (here, 400 GB)