================================================================================
ROLLBACK RUNBOOK: rb_921c0bca
================================================================================
Migration ID: 23a52ed1507f
Created: 2026-02-16T13:47:31.108500

EMERGENCY CONTACTS
----------------------------------------
Incident Commander: TBD - Assigned during migration
  Phone: +1-XXX-XXX-XXXX
  Email: incident.commander@company.com
  Backup: backup.commander@company.com

Technical Lead: TBD - Migration technical owner
  Phone: +1-XXX-XXX-XXXX
  Email: tech.lead@company.com
  Backup: senior.engineer@company.com

Business Owner: TBD - Business stakeholder
  Phone: +1-XXX-XXX-XXXX
  Email: business.owner@company.com
  Backup: product.manager@company.com

On-Call Engineer: Current on-call rotation
  Phone: +1-XXX-XXX-XXXX
  Email: oncall@company.com
  Backup: backup.oncall@company.com

Executive Escalation: CTO/VP Engineering
  Phone: +1-XXX-XXX-XXXX
  Email: cto@company.com
  Backup: vp.engineering@company.com

ESCALATION MATRIX
----------------------------------------
LEVEL_1:
  Trigger: Single component failure
  Response Time: 5 minutes
  Contacts: on_call_engineer, migration_lead
  Actions: Investigate issue, Attempt automated remediation, Monitor closely

LEVEL_2:
  Trigger: Multiple component failures or single critical failure
  Response Time: 2 minutes
  Contacts: senior_engineer, team_lead, devops_lead
  Actions: Initiate rollback, Establish war room, Notify stakeholders

LEVEL_3:
  Trigger: System-wide failure or data corruption
  Response Time: 1 minutes
  Contacts: engineering_manager, cto, incident_commander
  Actions: Emergency rollback, All hands on deck, Executive notification

EMERGENCY:
  Trigger: Business-critical failure with customer impact
  Response Time: 0 minutes
  Contacts: ceo, cto, head_of_operations
  Actions: Emergency procedures, Customer communication, Media preparation if needed

AUTOMATIC ROLLBACK TRIGGERS
----------------------------------------
• Error Rate Spike
  Condition: error_rate > baseline * 5 for 5 minutes
  Auto-Execute: Yes
  Evaluation Window: 5 minutes
  Contacts: on_call_engineer, migration_lead

• Response Time Degradation
  Condition: p95_response_time > baseline * 3 for 10 minutes
  Auto-Execute: No
  Evaluation Window: 10 minutes
  Contacts: performance_team, migration_lead

• Service Availability Drop
  Condition: availability < 95% for 2 minutes
  Auto-Execute: Yes
  Evaluation Window: 2 minutes
  Contacts: sre_team, incident_commander

• Data Integrity Check Failure
  Condition: data_validation_failures > 0
  Auto-Execute: Yes
  Evaluation Window: 1 minutes
  Contacts: dba_team, data_team

• Migration Progress Stalled
  Condition: migration_progress unchanged for 30 minutes
  Auto-Execute: No
  Evaluation Window: 30 minutes
  Contacts: migration_team, dba_team

ROLLBACK PHASES
----------------------------------------
1. ROLLBACK_CLEANUP
   Description: Rollback changes made during cleanup phase
   Urgency: MEDIUM
   Duration: 570 minutes
   Risk Level: MEDIUM
   Prerequisites:
     ✓ Incident commander assigned and briefed
     ✓ All team members notified of rollback initiation
     ✓ Monitoring systems confirmed operational
     ✓ Backup systems verified and accessible
   Steps:
     99. Validate rollback completion
        Duration: 10 min
        Type: manual
        Success Criteria: cleanup fully rolled back, All validation checks pass

   Validation Checkpoints:
     ☐ cleanup rollback steps completed
     ☐ System health checks passing
     ☐ No critical errors in logs
     ☐ Key metrics within acceptable ranges
     ☐ Validation command passed: SELECT COUNT(*) FROM {table_name};...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.tables WHE...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.columns WH...

2. ROLLBACK_CONTRACT
   Description: Rollback changes made during contract phase
   Urgency: MEDIUM
   Duration: 570 minutes
   Risk Level: MEDIUM
   Prerequisites:
     ✓ Incident commander assigned and briefed
     ✓ All team members notified of rollback initiation
     ✓ Monitoring systems confirmed operational
     ✓ Backup systems verified and accessible
     ✓ Previous rollback phase completed successfully
   Steps:
     99. Validate rollback completion
        Duration: 10 min
        Type: manual
        Success Criteria: contract fully rolled back, All validation checks pass

   Validation Checkpoints:
     ☐ contract rollback steps completed
     ☐ System health checks passing
     ☐ No critical errors in logs
     ☐ Key metrics within acceptable ranges
     ☐ Validation command passed: SELECT COUNT(*) FROM {table_name};...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.tables WHE...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.columns WH...

3. ROLLBACK_MIGRATE
   Description: Rollback changes made during migrate phase
   Urgency: MEDIUM
   Duration: 570 minutes
   Risk Level: MEDIUM
   Prerequisites:
     ✓ Incident commander assigned and briefed
     ✓ All team members notified of rollback initiation
     ✓ Monitoring systems confirmed operational
     ✓ Backup systems verified and accessible
     ✓ Previous rollback phase completed successfully
   Steps:
     99. Validate rollback completion
        Duration: 10 min
        Type: manual
        Success Criteria: migrate fully rolled back, All validation checks pass

   Validation Checkpoints:
     ☐ migrate rollback steps completed
     ☐ System health checks passing
     ☐ No critical errors in logs
     ☐ Key metrics within acceptable ranges
     ☐ Validation command passed: SELECT COUNT(*) FROM {table_name};...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.tables WHE...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.columns WH...

4. ROLLBACK_EXPAND
   Description: Rollback changes made during expand phase
   Urgency: MEDIUM
   Duration: 570 minutes
   Risk Level: MEDIUM
   Prerequisites:
     ✓ Incident commander assigned and briefed
     ✓ All team members notified of rollback initiation
     ✓ Monitoring systems confirmed operational
     ✓ Backup systems verified and accessible
     ✓ Previous rollback phase completed successfully
   Steps:
     99. Validate rollback completion
        Duration: 10 min
        Type: manual
        Success Criteria: expand fully rolled back, All validation checks pass

   Validation Checkpoints:
     ☐ expand rollback steps completed
     ☐ System health checks passing
     ☐ No critical errors in logs
     ☐ Key metrics within acceptable ranges
     ☐ Validation command passed: SELECT COUNT(*) FROM {table_name};...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.tables WHE...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.columns WH...

5. ROLLBACK_PREPARATION
   Description: Rollback changes made during preparation phase
   Urgency: MEDIUM
   Duration: 570 minutes
   Risk Level: MEDIUM
   Prerequisites:
     ✓ Incident commander assigned and briefed
     ✓ All team members notified of rollback initiation
     ✓ Monitoring systems confirmed operational
     ✓ Backup systems verified and accessible
     ✓ Previous rollback phase completed successfully
   Steps:
     1. Drop migration artifacts
        Duration: 5 min
        Type: sql
        Script:
          -- Drop migration artifacts
          DROP TABLE IF EXISTS migration_log;
          DROP PROCEDURE IF EXISTS migrate_data();
        Success Criteria: No migration artifacts remain

     99. Validate rollback completion
        Duration: 10 min
        Type: manual
        Success Criteria: preparation fully rolled back, All validation checks pass

   Validation Checkpoints:
     ☐ preparation rollback steps completed
     ☐ System health checks passing
     ☐ No critical errors in logs
     ☐ Key metrics within acceptable ranges
     ☐ Validation command passed: SELECT COUNT(*) FROM {table_name};...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.tables WHE...
     ☐ Validation command passed: SELECT COUNT(*) FROM information_schema.columns WH...

DATA RECOVERY PLAN
----------------------------------------
Recovery Method: point_in_time
Backup Location: /backups/pre_migration_{migration_id}_{timestamp}.sql
Estimated Recovery Time: 45 minutes
Recovery Scripts:
  • pg_restore -d production -c /backups/pre_migration_backup.sql
  • SELECT pg_create_restore_point('rollback_point');
  • VACUUM ANALYZE; -- Refresh statistics after restore
Validation Queries:
  • SELECT COUNT(*) FROM critical_business_table;
  • SELECT MAX(created_at) FROM audit_log;
  • SELECT COUNT(DISTINCT user_id) FROM user_sessions;
  • SELECT SUM(amount) FROM financial_transactions WHERE date = CURRENT_DATE;

POST-ROLLBACK VALIDATION CHECKLIST
----------------------------------------
 1. ☐ Verify system is responding to health checks
 2. ☐ Confirm error rates are within normal parameters
 3. ☐ Validate response times meet SLA requirements
 4. ☐ Check all critical business processes are functioning
 5. ☐ Verify monitoring and alerting systems are operational
 6. ☐ Confirm no data corruption has occurred
 7. ☐ Validate security controls are functioning properly
 8. ☐ Check backup systems are working correctly
 9. ☐ Verify integration points with downstream systems
10. ☐ Confirm user authentication and authorization working
11. ☐ Validate database schema matches expected state
12. ☐ Confirm referential integrity constraints
13. ☐ Check database performance metrics
14. ☐ Verify data consistency across related tables
15. ☐ Validate indexes and statistics are optimal
16. ☐ Confirm transaction logs are clean
17. ☐ Check database connections and connection pooling

POST-ROLLBACK PROCEDURES
----------------------------------------
 1. Monitor system stability for 24-48 hours post-rollback
 2. Conduct thorough post-rollback testing of all critical paths
 3. Review and analyze rollback metrics and timing
 4. Document lessons learned and rollback procedure improvements
 5. Schedule post-mortem meeting with all stakeholders
 6. Update rollback procedures based on actual experience
 7. Communicate rollback completion to all stakeholders
 8. Archive rollback logs and artifacts for future reference
 9. Review and update monitoring thresholds if needed
10. Plan for next migration attempt with improved procedures
11. Conduct security review to ensure no vulnerabilities introduced
12. Update disaster recovery procedures if affected by rollback
13. Review capacity planning based on rollback resource usage
14. Update documentation with rollback experience and timings
