SRE
Spider is critical for Site Reliability Engineers!
Use cases
- System discovery
- Ensure SLA / SLO
- Get statistics on the system calls quality (errors, success)
- Get statistics of the system performance
- Fast troubleshooting for support lvl3 with many filters
- Troubleshoot wierd behaviors
- Root cause analysis of issues
- Check overall quality of system at a glimpse
- Perform sanity checks after system upgrade
- Performance
- Track need of scaling replicas
- Check effect of tuning the system in real time
Tips
- Check overall quality with the Dashboard
- Use the statusCode widget to drill down fast on errors
- Use the duration heatmap to target outliers
- Save customised dashboards to focus on what you want to see
- Get to stats mode and drill down with dynamic filters to study production gestures distribution
- to troubleshoot outliers
- to find specifics of the outer percentiles
- Use sequence diagram mode to troubleshoot parallel processing
- Use the stats Excel export or, better, the REST API to get inner system performance metrics
- Create saved queries on errors criteria to monitor the system quality
- Save stats settings to have performance queries ready to run
- Easily compare performance in and out with timeline shift, and time locking for stats