Skip to main content

Status summary

Description

This dashboard provides a visual picture summarizing the state of the full cluster at any time.

Screenshot

DashboardSummary.png

Content

The content displayed is always computed based on the selected time.

The dashboard is designed so that it shows the health and speed of the system at one glance.
It is the "control tower". And is packed with information collected by all probes on Spider servers.

All other dashboard are used for troubleshooting.

Summary of applicative status (top left hand corner)

ApplicativeStatus.png

This block gives speed and load indication on ongoing parsing.

  • Processing speed indicators
    • Amount of packets received by min
    • Data uploaded by min
    • Tcp sessions received by min
  • Parsing speed:
    • Delay before parsing
    • Count of HTTP sessions created by min
  • Parsing errors: parsing errors by min

Tooltips shows min, max, average and last values.

Summary of servers status (bottom left hand corner)

ServersStatus.png

  • ES status: short status of the ES servers or pods - CPU, RAM, HEAP and disk used
  • Redis status: short status of the Redis servers or pods - CPU & RAM
  • Nodes status: short status of the Cluster servers - CPU & RAM

Tooltips shows min, max, average and last values.

Current status of probes (top right corner)

Probes.png

List all probes and their status. It calls the /health API documented here.

Network map of all Spider microservices with their inter communications

The map shows the summary of Whisperers status, UI usage status, Datastores status and Applicative cluster status:

  • Number of connected Whisperers and Gociphers are shown on the left Whisperers node
  • Number of connected Users is shown on the right UI node
  • Circuit breakers status is represented by the arrows
    • Green arrows when no errors, orange when errors
    • When hovered, the arrows display the speed and average response time
  • Services status is represented by the nodes
    • Blue nodes when no errors, red when errors
    • Size of nodes depending of the CPU usage
    • Color of nodes depending of the visibility or not of the nodes
warning

Controllers monitoring is not integrated yet. This is why they are shown as x0.

Views

The map as many different views to avoid seeing too many arrows at once:

  • Config paths
  • Query paths
  • Command paths
  • Upload paths
  • Purge paths
  • Monitoring paths
  • Maintenance paths

The map element reveal tooltips when leaving the mouse over them. They may be pinned to compare different periods easily

Example of tooltips:

TooltipsOnMap.png

Whisperers tooltips

  • Count of connected Whisperers / total
  • Count of Whisperers with > 10% CPU: not normal
  • Count of Whisperers with > 300 MB RAM: not normal
  • Count of Whisperers with PCAP overflow: means that network is too fast for them, they need to be better configured
  • Count of Whisperers with queues overflow: means that Spider servers are not scaled enough
  • Total count of requests / min from Whisperers to the Spider servers

Gociphers tooltips

  • Count of connected Gociphers / total
  • Count of Gociphers with > 10% CPU: not normal
  • Count of Gociphers with > 300 MB RAM: not normal
  • Count of Gociphers with queues overflow: means that Spider servers are not scaled enough
  • Total count of requests / min from Gociphers to the Spider servers

Services and UIs tooltips

  • Count of replicas
  • Total CPU usage on the cluster
  • Average CPU usage per replica
  • Average RAM usage of a replica
  • Count of Errors and Warnings in the logs
  • Count of requests In and Out per min
  • Statistics of operational APIs
    • Requests load
    • Latency
    • Percentile 90

Pollers tooltips

As for services +:

  • Average count of items waiting to be polled

Redis tooltips

  • Average CPU of the Redis instance (may be used for several DB)
  • Average RAM of the Redis instance
  • Average count of items in this Redis DB
  • Load requests per min

ES index tooltips

  • Average CPU used by this index over the period
  • Size of the index
  • Variation of size, count of items and count of deleted over the period
  • Speed of indexing, getting and searching
  • Total count of requests In

Browsers tooltips

  • Number of connected Users
  • Number of Users having clicked during the period ;)
  • Average duration of a user session

Traefik, Filebeat & Metricbeat

  • Count of replicas
  • Total CPU usage on the cluster
  • Average CPU usage per replica
  • Average RAM usage of a replica
  • Count of Errors and Warnings in the logs
  • List of requests made on the link with:
    • Requests lod / min
    • Average latency
    • Max 90% latency
    • Count of errors
  • On hover, show summary info

LinkHovered