Skip to main content

SRE

Spider is critical for Site Reliability Engineers!

Use cases

  • System discovery
  • Ensure SLA / SLO
    • Get statistics on the system calls quality (errors, success)
    • Get statistics of the system performance
  • Fast troubleshooting for support lvl3 with many filters
    • Troubleshoot wierd behaviors
    • Root cause analysis of issues
  • Check overall quality of system at a glimpse
    • Perform sanity checks after system upgrade
  • Performance
    • Track need of scaling replicas
    • Check effect of tuning the system in real time

Tips

  • Check overall quality with the Dashboard
    • Use the statusCode widget to drill down fast on errors
    • Use the duration heatmap to target outliers
  • Save customised dashboards to focus on what you want to see
  • Get to stats mode and drill down with dynamic filters to study production gestures distribution
    • to troubleshoot outliers
    • to find specifics of the outer percentiles
  • Use sequence diagram mode to troubleshoot parallel processing
  • Use the stats Excel export or, better, the REST API to get inner system performance metrics
  • Create saved queries on errors criteria to monitor the system quality
  • Save stats settings to have performance queries ready to run
  • Easily compare performance in and out with timeline shift, and time locking for stats