You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the reconciler fails to upgrade the release it rollbacks to the previous revision and returns an error. The controller runtime is expected to retry the reconciliation with an exponential backoff, but in reality it keeps reconciling over and over again. I was able to reproduce this behavior for the following use cases:
Lack of the PATCH permission for the operator service account to update the K8S object.
CRD used in the release has been removed from the cluster.
Every rollback increases the revision count. In my case, the operator spawns thousands of revisions in a matter of minutes.
Root cause
A rolled back revision is no different from the upgraded revision, it has the deployed status as after a normal upgrade. There always be a diff between the expected state calculated from the CR and the rolled back revision this will lead to a failed upgrade again and again.
There're events that are added in the reconciliation queue aside of the exponential backoff and cause the reconciliation without any delay. These events are:
CR status is updated on every reconcile. This is because Irreconcilable status is updated twice for every reconcile both with False (right before the upgrade) and True (after the upgrade failed).
A failed upgrade and subsequent rollback causes multiple changes in the secrets storage which are watched and each adds an item the reconciliation queue. Let's say that revision 1 was successful, revision 2 is problematic, revision 3 - is the rollback to revision 1. Upon upgrading to version 2 the following events will be triggered:
Create revision 2 with status pending-upgrade
Mark revision 2 as failed
Create revision 3 with status pending-rollback
Mark revision 1 as superseded
Mark revision 3 as deployed or failed depending on the rollback result.
There is deduplication in the queue, but still at least one event will be queued without delay.
Problem
When the reconciler fails to upgrade the release it rollbacks to the previous revision and returns an error. The controller runtime is expected to retry the reconciliation with an exponential backoff, but in reality it keeps reconciling over and over again. I was able to reproduce this behavior for the following use cases:
Deploymentis set by bothvalueandvalueFromtags (ROX-18477: operator delete valuesFrom in proxy config if values is set stackrox/stackrox#7105).Every rollback increases the revision count. In my case, the operator spawns thousands of revisions in a matter of minutes.
Root cause
A rolled back revision is no different from the upgraded revision, it has the
deployedstatus as after a normal upgrade. There always be a diff between the expected state calculated from the CR and the rolled back revision this will lead to a failed upgrade again and again.There're events that are added in the reconciliation queue aside of the exponential backoff and cause the reconciliation without any delay. These events are:
Irreconcilablestatus is updated twice for every reconcile both withFalse(right before the upgrade) andTrue(after the upgrade failed).pending-upgradepending-rollbacksupersededdeployedorfaileddepending on the rollback result.There is deduplication in the queue, but still at least one event will be queued without delay.