In these examples, we will assume that Walrus WS00 is the primary and WS01 is the secondary Walrus server.
Software Failure Example
In this scenario, WS01 refuses to go to DISABLED state. DRBD complains that it is in split brain mode. drbdadm cstate r0 shows that DRBD is in WFConnection state.
If you are sure that data on WS01 is out of date and can be discarded, execute
the following commands to restore HA mode.
- Shut down the eucalyptus-cloud process on WS01.
- Ensure that the DRBD connection is down by typing "drbdadm disconnect r0" on any of
the two Walrus hosts.
- On the primary Walrus, WS00, set drbd as the primary by executing "drbdadm primary
r0"
- On the secondary Walrus, WS01, execute the following command:
drbdadm -- --discard-my-data connect
 |
WARNING
This command will discard all data on WS01 and synchronize data from WS00.
|
- Monitor the state of DRBD by running:
watch -n 2 cat /proc/drbd
- When the data on WS01 is synced, start the eucalyptus-cloud process on WS01.
Hardware Failure Example
In this example, the primary WS00 needs to be taken out of service due to a
hardware failure, such as a failed disk.
- Shut down the eucalyptus-cloud process on WS00 if it is still running.
- Monitor service status by running euca-describe-services on WS01 and ensure that WS01 has taken over as the new primary (state: ENABLED).
- Shut down the host running WS00.
- If the host running WS00 is to be replaced entirely or the OS reinstalled:
- On WS00, execute the following command:
drbdadm -- --discard-my-data connect
 |
WARNING
This command will discard all data on WS00 and synchronize data from WS01.
|
- Monitor the state of DRBD by entering:
watch -n 2 cat /proc/drbd
WS01 should be marked as the primary and WS00 is the new secondary. Wait until data
is synchronized.
- When the data on WS00 is synced from WS01, start the eucalyptus-cloud process on WS00.
- Monitor service status by running "euca-describe-services" on the primary CLC and
ensure that WS00 is DISABLED and WS01 is ENABLED.
At this point, the Walrus service is back in HA mode.