Disaster Recovery¶
In a multi-controller deployment it can take only a single controller failure for a cluster to become unavailable.
A cluster becomes unavailable when a quorum is not longer able to be reached by the Controllers, preventing a new
cluster leader from being elected. The number of controllers needed for a quorum to be reached is (N/2)+1
where
N is the initial number of controllers. For example, if there are 3 Controllers initially, there would need to be
at least 2 controllers to form a quorum.
While a cluster is unavailable, the proxies will continue to operate with their last known configuration.
Important
In a single Controller deployment, if the state of the controller is lost during a controller failure, it must be restored from a backup.
Learn more about Backup and Restore.
Recovering a Cluster¶
In order for the cluster to be recoverable, at least one controller must remain with its state intact. If this is not the case, see the next section.
The cluster can be recovered by using teectl
:
teectl recover
Important
If teectl
cannot connect due to a loss of credentials, the recover
command can be run from
traefikee
directly.
This connects to the remaining controller and perform the recovery operation. Watch the controller logs to see the progress of the recovery. Look for the following log message to know the recovery completed successfully:
"Node is recovered and ready"
Once the recovery is complete, get the nodes, and remove the controllers that are down.
teectl get nodes
ID NAME STATUS ROLE
7022qjifgp2srv3tq6rxyjhdm default-proxy-d55569575-n8mmb Ready Proxy
muaxom14euley813tv40t9xxs default-controller-1 Down Controller
q01yagt5suwv2tooy8amm248k default-controller-0 Ready Controller (Leader)
s4eg90svby3aty5r6o9xkcpeh default-proxy-d55569575-kzn82 Ready Proxy
w9aggc8l9lmwni7l5y2l1z4ww default-controller-2 Down Controller
teectl delete node --id="muaxom14euley813tv40t9xxs"
teectl delete node --id="w9aggc8l9lmwni7l5y2l1z4ww"
The deployment is now acting as a single controller deployment. Additional controllers must now be started to bring the cluster back into highly availability again.
Congratulations! Your TraefikEE cluster is recovered.
Recovering from State Loss¶
If all controllers and their state were lost, it is not possible to use the recovery procedure above to recover the cluster. At this point the cluster must be restored from a backup.
For more information on restoring from a backup, refer to Backup and Restore.