Tags cardinality and count
Extracted HTTP tags are now saved with count and cardinality!
Extracted HTTP tags are now saved with count and cardinality!
I got the idea to remove the processing of empty packets from the Whisperers, when you DON'T WANT / NEED to store those packets in the server.
And saved us 30% in CPU usage!
Spider may be very useful to extract system statistics.
But its small retention time (due to high data volume) limits the usage you may do with these stats.
This new tool extracts HTTP statistics from Spider captured data and save them into another index with longer retention time.
To help new users onboarding, I've added a welcome screen that shows at first connection, after the Terms approval one.
Spider infrastructure has been upgraded to use Redis 7.0 arguing for better performance.
Redis 7 reveals to be indeed faster but with a slight more CPU usage.
As you surely have seen it, Spider's website has changed from top to bottom. I've migrated from an old Wordpress CMS to static pages managed by Docusaurus 2 .
It will be easier for me to maintain, backup, extend it, and also... avoid spam!
I got today an idea from Enzo: add a link next to the Location header in the HTTP details to allow loading the linked resource from the HTTP response.
As I liked the idea, I did it straight :)
From the official RFC, the location may be absolute or relative, so Spider manages both:
Now the location is then available in the HTTP global tab. Thanks you Enzo!
Following the work and good results on Timeline, the same progressive loading has been applied for the network map.
Although more complex to do, having done the timeline before made it quite quick to do (<10h).
Test have proven it working to display two full days of data for all monitored platforms on Streetsmart. An aggregation over 100GB of data.
I wouldn't say it was fast though, I stopped it before the end as there was nodes all over the place ;-)
Check it out! Fireworks!
In Flowbird production environment, Spider is capturing around 400GB per day. Impressive!
But it is a challenge by itself! If the capture works great, the UI, map and statistics are getting timeouts when loading the timeline around the whole day.
To improve the situation, I've added an option to the timeline (first) to do progressive loading of the data, with pagination. It uses the composite aggregation of Elasticsearch, and the results is updated the existing timeline data whenever possible, instead of resetting the whole timeline everytime
The settings is activated by default and may be deactivated in the display settings:
Pros
Drawback:
I think it is worth it.
With new Elasticsearch releases: 8 and on, security is active by default on the cluster:
In order to be ready to use it, I upgraded all microservices using Elasticsearch to support all authentication methodes supported by ES Javascript client. Everything is managed by the central setup, that expect Elasticsearch setup to required authentication.
TLS may also be used to connect to Elasticsearch, with self signed certificates if needed.