This talk reports Facebook's experience of managing the backbone network
during the COVID-19 crisis. Our philosophy centers around “risk
prevention” to identify potential failures in the network and mitigate
their effects. We define metrics for network risk and quantify the
impact of COVID-19 with them. We also describe a risk assessment system
that has been in production for three years, which involves accurate
failure modeling and efficient risk simulation. With ten months of
assessment results, we claim our backbone to be robust against the
COVID-19 stress test, achieving high service availability and low
routing dilation. We share our operational measures to minimize possible
traffic loss. Surprising findings during this period give us insights to
further improve our approach. First, we observe a substantial reduction
of optical failures because of less human activity, which inspires
failure prediction to trade model stability for agility by considering
short-term failure statistics when necessary. Second, we find a negative
correlation between network traffic and human mobility, indicating non-
networking signals traditionally ignored can be used for better network
management.