Frequently Asked Questions

What is Spider and what problem does it solve in distributed systems?

Spider is a distributed system analyzer that provides complete and intuitive observability of communications between system components without requiring any instrumentation or code changes in the observed systems.
It tackles the challenges of understanding complex distributed systems, especially those using microservices and Kubernetes, where traditional logs and metrics often fall short in providing a complete picture for troubleshooting.

The core problem Spider addresses is the significant time and effort spent on root cause analysis (RCA) and understanding system behaviour, especially when facing unexpected issues.

By capturing, rebuilding, and visually analyzing all network communications at the infrastructure layer, Spider allows users to "see" what is happening, rather than just "suppose," thereby drastically reducing the Mean Time To Resolution (MTTR) of problems.

What are the main components of the Spider architecture and the technologies it utilises?

Spider employs a microservices-based Command Query Responsibility Segregation (CQRS) architecture, decoupling capture/analysis and search flows into independent, optimised services.

The key components include:

Agents (Controllers, Whisperers and Gociphers):

Whisperers capture network packets, rebuild TCP sessions, and resolve hostnames using the AF_PACKET socket.
They are deployed on servers or services and send captured data to Spider servers via REST APIs over HTTPS.
Gociphers use eBPF technology to hook into the Linux kernel and userspace functions (specifically OpenSSL libraries) to extract TLS secrets for decrypting encrypted communications and to capture network usages. They can be deployed as Daemonsets in Kubernetes or standalone.
Controllers remotely manage other agents, primarily Whisperers, enable their deployment in Kubernetes clusters, and enrich network usage data by listening to the Kube API server.

Services and Data Stores:

Microservices are built in Node.js, exposing an Open API with REST APIs.
Security is managed with JSON Web Tokens (JWT) and RBAC using Key pairs.
Data storage is handled by Elasticsearch for both captured (packets, TCP sessions, HTTP communications, configuration, and monitoring data) and monitoring data, designed for billions of communications.
Redis is used for distributed in-memory sharing.

User Interface (UI):

A web application built using React.js, providing visual, real-time, and dynamic analysis tools for navigating and analysing data.

The system is packaged with Docker and clustered with Kubernetes, with deployments managed via Helm charts.

How does Spider handle network data capture, including encrypted traffic (TLS)?

Spider captures network traffic through dedicated agents called Whisperers.
These agents use AF_PACKET to capture network packets from physical or virtual interfaces, applying pcap filters to limit the captured traffic.
Captured data is then sent to Spider servers.

For encrypted traffic, particularly TLS (Transport Layer Security), Spider employs a sophisticated mechanism involving Gocipher agents and eBPF (extended Berkeley Packet Filter) technology.
TLS encryption, especially TLS 1.3, relies on randomisation, meaning the server key alone is insufficient for decryption.
Gociphers inject custom eBPF programs into the running Linux kernel to hook into OpenSSL library functions (like SSL_do_handshake, SSL_write, SSL_read) and extract session secrets from memory structures (SSL_ST struct).
These secrets are then associated with captured TCP sessions, enabling the decryption of TLS payload in streaming for HTTP parsing.
This approach allows Spider to decrypt and analyse previously opaque encrypted communications.

What are the key features and use cases of Spider's analysis and visualization tools?

Spider offers a comprehensive suite of analysis and visualisation tools designed for discovery and troubleshooting:

Complete Observability: Captures communications at various networking layers, including IP, UDP, TCP, and TLS/HTTPS, with plans to support gRPC, Redis, PostgreSQL etc...
Visual, Real-Time, and Dynamic Analysis Tools: Provide an at-a-glance overview of system health.
Network Maps: Self-construct and editable maps showing aggregated views of filtered communications, allowing users to visualise service interactions and identify potential issues.
Sequence Diagrams: Display communications as a sequence of calls between clients and servers.
Data Navigation: Allows browsing and analysis of unitary communications, with time travel features ranging from days to milliseconds.
Drill-down Capabilities: Powerful features with various helpers and filters to identify and understand disturbances.

Operations features: No Code Instrumentation is required: Spider operates at the infrastructure layer, eliminating the need to modify application code.
Easy to Operate: Features automated setup, remote deployment, and configuration of capture agents, embedded monitoring and alerting, a self-healing architecture, and automatic scaling.

Spider's proven use cases include:

Analysis/Learning: Discovering system behaviour, checking internal integrations, and understanding service interactions.
Tests: Checking developments, performing sanity checks post-deployment, and collecting samples for Consumer Contract Driven Tests.
Support: Root cause analysis of issues, validating third-party integrations, and troubleshooting network communications.
Performance Engineering: Checking and troubleshooting system performance, and collecting application performance metrics.

How does Spider ensure data quality and integrity during the capture and parsing process?

Spider monitors its own activity and offers comprehensive dashboards to check its behaviour and parsing quality, aiding in adapting client infrastructure for optimal performance.

Key aspects related to data quality include:

Capture Status and Quality Line: Whisperers send their status every 20 seconds, including capture statistics (captured, filtered by pcap, dropped by kernel, overflow due to API speed issues)... A "quality line" on the timeline display, with 1-minute resolution, indicates the quality of capture and parsing, helping to determine if missing items are due to Spider or the monitored software.
Parsing Status: The system reports parsing status, identifying issues due to missing packets, which can occur if the host or Spider server is overloaded, or if a communication issue happened during the process.
Deduplication Feature: For distributed network capture, Spider includes a deduplication feature to ignore communications captured on both sides, ensuring accurate data representation.

These mechanisms allow users to diagnose issues in data collection and parsing, ensuring the reliability of the observability provided by Spider.

What are the security and administrative features within Spider?

Spider incorporates several security and administrative features to manage user access, data, and system configurations:

Authentication and Authorisation: All Spider services utilise JWT (JSON Web Tokens) for authentication and authorisation. Upon login, a JWT token is generated, embedding the user's identity and access rights. Every API request is checked against this token.
Role-Based Access Control (RBAC): Access to features and data is governed by RBAC, ensuring users only access what they are permitted. Agents management services (like Controls and Ciphers) perform additional access rights checks.
Teams Concept: Users can work individually or join teams. A team allows for sharing configurations (like UI settings and Whisperers) among members. Team administrators can add users, and users can join a team using a join token, which is generated uniquely. Whisperer settings are specific to each Whisperer but can be copied between them.
User Management: Accounts can be created via a login page (including LDAP and OIDC integration) or by an administrator. Administrators can edit user rights from a profile panel.
Public Links: A mechanism to share specific Spider data and UI state with individuals who may not have a standard Spider account or the necessary access rights. These links allow for controlled, temporary sharing for collaboration, documentation, or third-party access and can be secured with features like One-Time Passwords (OTP) and email whitelisting.
Sensitive Data Handling: While Spider captures all inter-service communications, potentially including sensitive data (passwords, tokens, personal information), it offers multiple filters to prevent storing and indexing sensitive data.
Monitoring and Reporting: Spider monitors its own activity and sends errors and usage statistics to the server by default. Users can deactivate these features or choose to anonymise the statistics.

These features collectively contribute to a secure and manageable environment for observing complex distributed systems.

How is Spider deployed and what are its external dependencies?

Spider is designed for modern, cloud-native deployments and can be installed on various Kubernetes distributions.

Deployment Target: Spider server components are deployed on Kubernetes (e.g., AWS EKS, Nutanix Karbon, Rancher K3S). It deploys as a set of microservices and datastores along with capture agents.
Packaging and Orchestration: Server components are packaged using Helm charts, which are available in a custom public Helm repository (https://repository.floocus.com/helm). Agents are available in a public Docker registry, services in a private one, and datastores on external registries.
External Dependencies (Mandatory for setup):
- Kubernetes: The container orchestration platform.
- Helm: For packaging and deploying applications on Kubernetes.
- Elastic ECK operator: To set up Elasticsearch using Custom Resource Definitions (CRD).
Optional Dependency: S3 compatible storage for automatic configuration backup and restore.
The architecture allows for flexibility, including the ability to attach Whisperer agents without POD restarts, and supports both local or distant Kubernetes environments. for the services being captured.

What is the licensing and pricing model for Spider?

Spider's commercial license model is based on CPU usage.

Billing Principle: Clients are invoiced based on their CPU usage. This typically involves a yearly prepaid plan for a certain number of CPU cores.
Over-usage Charges: If usage exceeds the prepaid amount, it is billed as extra core usage at a 33% higher rate per core.
Regulatory invoices for any CPU usage above the prepaid plan are issued every six months.

Cost Estimation: The Floocus website (users.floocus.com) provides a dedicated page for license cost estimation and allows users to manage their licenses and view statistics.

Predictability and Flexibility: This CPU-based billing model aims to provide a predictable invoice for budgeting while offering no restrictions on the number of user accounts, observed systems, or data volume.
It allows for unlimited usage that can change whenever required.

For example, a scenario of 5 prepaid cores at 110€/month would result in a yearly cost of 6600€. Monthly updates are included, and there are no restrictions on user accounts, data volume, or the number of servers monitored.

License Validity: Spider monitors the last sent statistics to the Floocus server. If no new statistics are received for 24 hours, the license is stopped, halting data collection and parsing.
Parsing reactivates upon resumed statistics transmission.

Are there API keys to interact with the Spider API?

Spider supports dedicated service accounts for API interactions.
These service accounts are designed for programmatic access to Spider APIs and can be managed in a manner similar to regular user accounts.

They may be created on the UI by administrators and given with fine-grained permissions to automate tests collecting results or alerting probes.

Is there a SaaS (Software as a Service) version of Spider?

Spider is currently only available as local deployments in your own infrastructure.
However, a SaaS version is part of the roadmap, for sure.

Is Spider an Open Source Software (OSS)

No, Spider has been developed in a close source model.
However, some components have been already open sourced for reuse (Timeline, Plugins...), and some other parts will be in the future.

Can I install Spider locally?

Spider server is meant to be installed in your IT infrastructure.
But agents may be deployed anywhere.

Specific local agents are even meant specifically for developers or testers, and may be created for any users by the administrators.

What is Spider and what problem does it solve in distributed systems?​

What are the main components of the Spider architecture and the technologies it utilises?​

How does Spider handle network data capture, including encrypted traffic (TLS)?​

What are the key features and use cases of Spider's analysis and visualization tools?​

How does Spider ensure data quality and integrity during the capture and parsing process?​

What are the security and administrative features within Spider?​

How is Spider deployed and what are its external dependencies?​

What is the licensing and pricing model for Spider?​

Are there API keys to interact with the Spider API?​

Is there a SaaS (Software as a Service) version of Spider?​

Is Spider an Open Source Software (OSS)​

Can I install Spider locally?​