From an internal email of a very very big corporate company
Incident Background:
BIGPROJECT has been unavailable since APAC SOD due to a data refresh activity being wrongly triggered in from UAT to Production environments.Business Impact
- BIGPROJECT is unavailable for all users in the bank
- BIGPROJECT2 platform which sits on BIGPROJECT is unavailable this includes the Click-to-chat serviceCurrent Status
- Initial attempt of flashback Database to restore from the last good restore point failed due to errors due to absence of flashback logs– this was a quicker option, but now ruled out.
- Currently going ahead with full restoration in the Primary database – this activity is tentatively supposed to take 8-9 hours (in place of 6 hours earlier mentioned)
[…manual recovery instruction follows… ]
After 5 hours in another email they dare to say:
- Currently 32% of database back up is completed and will take approximately 8-14 hours.
Let’s explain
BIGPROJECT is THE trouble ticketing + change management internal software, so entire bank cannot delivery software today…
So what happen? We can try to translate the email in a more “ops-dev” way….
- Someone clicked a button, made a wrong “promote” in production and altered production database schema
- They were unable to restore the database using a trick called Oracle flashback.
- Their recovery strategy will take more then 14 hours to complete.
In the meantime the entire Bank cannot deploy anything.
Hope you did not have some urged need. - Keep in touch for some thrilling news (are you with us? you fainted?)
By the way oracle flashback is not meant to replace your backup.
DevOps is a mental state.
You must have a reasonable fast recovery procedure for mission critical application and it must be completely automatic. No a trial and error approach based on slow tape backups.