Degraded performance of rsync repositories

Incident Report for RIPE NCC

Postmortem

On 28 July, 2025, the RPKI rsync repositories experienced degraded performance for approximately four hours during a planned data centre migration. Users may have noticed slower repository synchronization times (or, in rare cases, may have been unable to synchronize) between 09:20 and 13:24 UTC.

This incident occurred during an ongoing data centre migration project, where servers and network equipment were moved from one facility to another. Part of these servers were virtualization hosts. When those are offline for more than 60 minutes, the distributed storage system automatically initiates data repair operations to maintain data integrity and availability.

During the migration window, these repair operations generated significant network traffic to synchronize data across our infrastructure. This coincided with existing regular storage traffic, creating a network bottleneck that impacted storage performance. Since the RPKI rsync repositories rely on this storage infrastructure, users experienced performance degradation.

For subsequent server migrations later that day and the next day, the RPKI rsync repositories were temporarily moved into a memory filesystem to mitigate the risk of facing the same bottleneck again.

Based on this incident, several improvements will be implemented:

  • Update the data centre migration procedures to better account for storage system behaviours and network capacity requirements
  • Review and adjust storage system timeout settings to better align with our planned maintenance windows
  • Evaluate network capacity improvements to handle concurrent migration and repair operations more effectively
Posted Jul 30, 2025 - 16:05 CEST

Resolved

This incident has been resolved.
Posted Jul 28, 2025 - 15:24 CEST

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Jul 28, 2025 - 12:32 CEST

Investigating

We notice our rsync repositories to be slower than usual. We are investigating.
Posted Jul 28, 2025 - 11:33 CEST
This incident affected: RPKI (rsync Repository).