RIPE Atlas result processing interrupted
Incident Report for RIPE NCC
Resolved
The situation is now clear. Due to natural growth and some recent retention changes to the streaming service used for RIPE Atlas, this service ran into open file descriptor limits that we were not alerting for. The service crashed as a result of hitting this limit which blocked the processing of all results and the scheduling of new measurements and many internal tasks such as crediting and internal maintenance. We mitigated this issue by increasing the open file descriptor limit for the service in question. The service going down then affected other components of the infrastructure which meant that even though the streaming service was back online, data wasn't flowing as it should.

We have identified several improvements to help ensure this does not happen again, and to improve the resilience of the system once it does, and will be working on implementing these in the coming period.
Posted Jan 27, 2025 - 15:15 CET
Update
We have identified the cause of the issue and are investigating solutions.
Posted Jan 26, 2025 - 20:31 CET
Investigating
There is currently an issue with the processing of incoming RIPE Atlas results. We're investigating the problem.
Posted Jan 26, 2025 - 13:23 CET
This incident affected: Non-Critical Services (DNSMON, RIPE Atlas).