Skip to main content

Parsing status

Description

This dashboard provides an insight on the parsing quality and speed of Spider.

It tells you if Spider tuning is right for the quantity of data to parse. It also tells how many items have been parsed successfully to find how many communications items.

It is the 'operational' dashboard of Spider.

Screenshot

ParsingScreen.png

Content

Parsing queued (timed chart)

Shows the amount of TcpSessions that are waiting in the parsing queue, over time.

The amount should be roughly stable around (15s * count of Tcp Sessions sent by second).
Having an ever increasing amount in this graph would mean that the parsing is not scaled enough to cope with the input.

If so, then:

  • Scale web-write replicas
  • Scale pack-read replicas

ParsingScreen-ParsingQueued.png

Tcp parsing status (timed chart)

Shows the count of Tcp sessions, grouped by their parsing status, overt time.

Tcp session may be in:

  • Waiting or Pending, when they are queued for parsing
  • Incomplete when parsing is in progress, and session not closed
  • Warning when parsing failed but there is to be a second trial
  • Error or OK, when parsing is finished

The less red, the better ! :)
Errors could have many factors, but mainly: CPU contention on clients or servers, resulting in missing packets when parsing is done.

ParsingScreen-TcpParsingStatus.png

Max 95% of parsing delay (timed chart)

Shows the 95 percentile of the dalay between the creation of the Tcp session in the backoffice, and its parsing.

  • Spider waits 10s - by default - before starting the parsing.
  • Spider starts removal of packets from cache after 45s - by default.

So doing, this metric should sta between 10 and 45 seconds.

ParsingScreen-ParsingDelay.png

Avg parsing duration (timed chart)

Shows the evolution of parsing duration for a page of TcpSessions, including calls to pack-read to get packets.

This should stay around 25 - 40ms.

ParsingScreen-ParsingDuration.png

Packet lots parsing status (timed chart)

Shows the count of packet lots, grouped by their parsing status.
A packet lot is a group of packets that together form a request or a response. Most of the time, there are 2 packet lot for 1 communication.

The status may be:

  • Pending - Packet lot has been created, but packets where not fetch yet
  • Fetched - Packets have been fetched
  • Parsed - Packet lot has been parsed, but is incomplete
  • Discarded - Spider could not match this packet lot to an HTTP request, or a response linked to a request.
  • Filtered - Spider has filtered the packet log base on its configuration.
  • Fetching warning - Packet lot has been created, but packets are not available yet.
  • Fetching error - Spider could not find any packet for this packet lot.
  • Complete - Successful parsing.

Having other statuses than Discarded, Filtered or Complete means that there are issues in parsing.

ParsingScreen-PacketLots.png

Communications created (timed chart)

Shows the result of parsing ovr time.

How many communications were successfully created, and how many errors were raised.

ParsingScreen-WebWrite.png