As we expected, the error rate has recovered after the deployment. Furthermore, the change for the (office hours only) alerting has been deployed.
Posted Aug 04, 2025 - 15:46 CEST
Monitoring
We deployed the fix by 10:45 UTC and service has been restored.
The root cause was a significant code change that behaved differently in production than in our staging environment due to configuration differences between the two environments. Our error-rate based alerts did not trigger during this incident because the error rate did not reach our 10% threshold.
To prevent similar issues, we have lowered the alert threshold to be significantly below the error rate experienced during this event. We are also reviewing our staging environment configuration.
Posted Aug 04, 2025 - 12:45 CEST
Identified
RIPEstat API calls to a number of endpoints (including routing-status and visibility) are failing after a bug introduced in a release since around 9:15 UTC. We are working to release a fix for this regression as soon as possible.
Posted Aug 04, 2025 - 12:20 CEST
This incident affected: Non-Critical Services (RIPEstat).