This sanity check is used after an incident, failure, or emergency action on a DirectAdmin‑managed VPS to confirm the system has returned to a stable, known‑good state before declaring recovery complete.
Scope and intent
- Confirm system stability after an incident or outage
- Detect lingering issues before resuming normal operations
- Validate that emergency actions did not introduce new risk
- Provide a deliberate pause before declaring recovery complete
When to use this sanity check
- After service outages or partial failures
- After emergency fixes or manual intervention
- After restoring from backup or rolling back changes
- Any time the server behaved unexpectedly
What this check is not
- Not a troubleshooting guide
- Not a replacement for root‑cause analysis
- Not a post‑mortem process
Prerequisites
- Administrative or root access
- The incident condition is no longer actively worsening
- Emergency actions have been completed or paused
1. Confirm the original failure condition is resolved
- Verify the triggering symptom no longer exists
- Confirm users or monitoring are no longer reporting the issue
- Ensure no temporary workarounds are masking the problem
2. Verify core services are running
- Confirm critical services are active and stable
- Ensure no service is crash‑looping or repeatedly restarting
- If needed, validate using Core Service Health Check Routine
3. Check system resources
- Confirm disk space, memory, and load are within normal ranges
- Ensure the incident did not introduce sustained resource pressure
- Investigate abnormal usage before proceeding
4. Review logs for post‑incident errors
- Scan recent logs for recurring or new errors
- Confirm errors align with the resolved incident timeline
- If patterns appear, review using Log Review Routine
5. Validate recent changes or emergency actions
- Confirm emergency configuration changes are intentional
- Ensure temporary fixes are documented or reverted
- Check for configuration drift introduced during recovery
6. Confirm backup state
- Ensure backups are still running as expected
- Confirm no backup processes were disabled or broken
- Note the last known‑good recovery point
7. Restore normal monitoring expectations
- Confirm monitoring and alerts are functioning
- Ensure alert thresholds were not permanently muted
- Watch for early warning signals after recovery
8. Record the recovery checkpoint
- Document what was changed during the incident
- Record the time recovery was declared stable
- Note any follow‑up investigation required
Completion criteria
- The original incident condition is resolved
- No new errors or instability are observed
- The system is operating within normal parameters
Next step — based on your current state:
- If instability remains, pause and consult When to Pause and Investigate vs Proceed.
- If recovery is stable after an update, validate using After Server Update Verification Checklist.
- If the system is stable, return to normal operations and resume routine maintenance.

