Monitoring - Status summary dashboard
Description
This dashboard provides a visual picture summarizing the state of the full cluster at any time.
Screenshot
Content
- Summary of applicative status (top left hand corner):
- Processing speed indicator: amount of packets and tcp sessions received by minutes
- Parsing errors: % of parsing errors
- Summary of servers status (bottom left hand corner):
- ES status: short status of the ES servers - CPU, RAM and HEAP used
- Nodes status: short status of the Cluster servers - CPU and RAM
- A network map of all Spider microservices with their inter communications
- The map shows the summary of Whisperers status, UI usage status, Datastores status and Applicative cluster status:
- Number of connected Whisperers is shown on the left Whisperers node
- Number of connected Users is shown on the right UI node
- Circuit breakers status is represented by the arrows
- Green arrows when no errors, orange when errors
- When hovered, the arrows display the speed and average response time
- Services status is represented by the nodes
- Blue nodes when no errors, red when errors
- Size of nodes depending of the CPU usage
- Color of nodes depending of the visibility or not of the nodes
- The map shows the summary of Whisperers status, UI usage status, Datastores status and Applicative cluster status:
- 4 differents views to avoid seeing too many arrows at once:
- Query path
- Command path
- Upload path
- Monitoring path
- Tooltips detailling the status of each Node / Link
- Can be pinned to compare different periods easily
Example of tooltips:
For Whisperers:
- Count of connected Whisperers / total
- Count of Whisperers with > 10% CPU: not normal
- Count of Whisperers with > 150 MB RAM: not normal
- Count of Whisperers with PCAP overflow: means that network is too fast for them, they need to be better configured
- Count of Whisperers with queues overflow: means that Spider servers are not scaled enough
- Total count of requests / min from Whisperers to the Spider servers
For Services:
- Count of replicas
- Total CPU usage on the cluster
- Average RAM of a replica
- Count of Errors and Warnings in the logs
- Count of requests In and Out
For Pollers:
- Count of replicas
- Total CPU usage on the cluster
- Average RAM of a replica
- Count of Errors and Warnings in the logs
- Average count of items waiting to be polled
- Count of requests In and Out
For Redis DB:
- Average CPU of the Redis instance (may be used for several DB)
- Average RAM of the Redis instance
- Average count of items in this Redis DB
- Total count of requests In
For ES index:
- Average CPU used by this index over the period
- Size of the index
- Variation of size, count of items and count of deleted over the period
- Speed of indexing, getting and searching
- Total count of requests In
For Browsers:
- Number of connected Users
- Number of Users having clicked during the period ;)
- Average duration of a user session
For each link between nodes:
- List of requests made on the link with:
- Count of requests / min
- Average latency
- Max 90% latency
- Count of errors
- On hover, show summary info:
Last but not least, UI services... are not monitored... yet ;-)