OpsDev
From an internal email of a very very big corporate company
Incident Background: BIGPROJECT has been unavailable since APAC SOD due to a data refresh activity being wrongly triggered in from UAT to Production environments.[...manual recovery instruction follows... ]Business Impact
- BIGPROJECT is unavailable for all users in the bank
- BIGPROJECT2 platform which sits on BIGPROJECT is unavailable this includes the Click-to-chat serviceCurrent Status
- Initial attempt of flashback Database to restore from the last good restore point failed due to errors due to absence of flashback logs– this was a quicker option, but now ruled out.
- Currently going ahead with full restoration in the Primary database – this activity is tentatively supposed to take 8-9 hours (in place of 6 hours earlier mentioned)
After 5 hours in another email they dare to say:
- Currently 32% of database back up is completed and will take approximately 8-14 hours.
Let's explain
BIGPROJECT is THE trouble ticketing + change management internal software, so entire bank cannot delivery software today... So what happen? We can try to translate the email in a more "ops-dev" way....- Someone clicked a button, made a wrong "promote" in production and altered production database schema
- They were unable to restore the database using a trick called Oracle flashback.
- Their recovery strategy will take more then 14 hours to complete. In the meantime the entire Bank cannot deploy anything. Hope you did not have some urged need.
- Keep in touch for some thrilling news (are you with us? you fainted?)