mirror of
https://github.com/Dvorinka/Productier.git
synced 2026-06-04 04:23:00 +00:00
196 lines
3.8 KiB
Markdown
196 lines
3.8 KiB
Markdown
# Productier Disaster Recovery Runbook
|
|
|
|
## Scope
|
|
|
|
This runbook covers backup and restore of:
|
|
|
|
- PostgreSQL data (`postgres` service)
|
|
- Object storage data (`rustfs` bucket configured by `S3_BUCKET`)
|
|
|
|
It assumes the production compose stack and env setup in:
|
|
|
|
- `infra/docker-compose.prod.yml`
|
|
- `.env.production`
|
|
|
|
## Preconditions
|
|
|
|
1. Validate production env:
|
|
|
|
```bash
|
|
npm run check:prod-env
|
|
```
|
|
|
|
2. Ensure production services are running:
|
|
|
|
```bash
|
|
npm run ops:deploy
|
|
```
|
|
|
|
If deployment is already running and you only want readiness validation:
|
|
|
|
```bash
|
|
npm run ops:preflight
|
|
```
|
|
|
|
## Backup
|
|
|
|
Create a timestamped backup:
|
|
|
|
```bash
|
|
npm run ops:backup
|
|
```
|
|
|
|
Output directory format:
|
|
|
|
`backups/<UTC timestamp>/`
|
|
|
|
Expected files:
|
|
|
|
- `postgres.sql.gz`
|
|
- `s3/` (synced object data)
|
|
- `checksums.sha256`
|
|
- `metadata.json`
|
|
|
|
Verify backup:
|
|
|
|
```bash
|
|
bash scripts/ops/verify-backup.sh backups/<timestamp>
|
|
```
|
|
|
|
Run full scheduled-style backup flow (backup + verify + prune):
|
|
|
|
```bash
|
|
npm run ops:backup:job
|
|
```
|
|
|
|
## Restore
|
|
|
|
Restore is destructive and requires explicit safety flags.
|
|
|
|
```bash
|
|
FORCE=1 RESET_DB=1 RESTORE_S3=1 \
|
|
bash scripts/ops/restore-prod.sh .env.production backups/<timestamp>
|
|
```
|
|
|
|
Flags:
|
|
|
|
- `FORCE=1`: required; otherwise restore exits
|
|
- `RESET_DB=1`: recommended; drops and recreates schema before import
|
|
- `RESTORE_S3=1`: restore object storage (default on)
|
|
|
|
## Restore drill (non-destructive)
|
|
|
|
Run drill against latest backup:
|
|
|
|
```bash
|
|
npm run ops:restore:drill
|
|
```
|
|
|
|
Run full isolated staging drill (temporary compose project + teardown):
|
|
|
|
```bash
|
|
npm run ops:drill:staging
|
|
```
|
|
|
|
Run drill against specific backup:
|
|
|
|
```bash
|
|
bash scripts/ops/restore-drill.sh .env.production backups/<timestamp>
|
|
```
|
|
|
|
Drill behavior:
|
|
|
|
- Imports DB dump into temporary drill database
|
|
- Runs sanity check (`public` table count)
|
|
- Optionally syncs backup objects into a temporary drill bucket
|
|
- Drops temporary drill DB and bucket by default
|
|
|
|
## Post-restore checks
|
|
|
|
1. API health:
|
|
|
|
```bash
|
|
curl -sS https://<PUBLIC_DOMAIN>/v1/health
|
|
```
|
|
|
|
2. Auth health:
|
|
|
|
```bash
|
|
curl -sS https://<PUBLIC_DOMAIN>/api/auth/get-session
|
|
```
|
|
|
|
3. Manual smoke:
|
|
|
|
- sign in
|
|
- open board/calendar/notes/mail
|
|
- download at least one attachment
|
|
|
|
Automated smoke script:
|
|
|
|
```bash
|
|
npm run ops:smoke
|
|
```
|
|
|
|
The smoke script checks:
|
|
|
|
- public homepage response
|
|
- `/v1/health` payload
|
|
- security response headers
|
|
- HTTP->HTTPS redirect behavior
|
|
|
|
## Cadence recommendations
|
|
|
|
- Daily backup (off-peak)
|
|
- Weekly restore drill in staging
|
|
- Keep at least 14 daily restore points and 8 weekly restore points
|
|
|
|
## Automation (systemd)
|
|
|
|
Template files:
|
|
|
|
- `infra/systemd/productier-backup.service`
|
|
- `infra/systemd/productier-backup.timer`
|
|
|
|
Example install on Linux host:
|
|
|
|
```bash
|
|
sudo cp infra/systemd/productier-backup.service /etc/systemd/system/
|
|
sudo cp infra/systemd/productier-backup.timer /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable --now productier-backup.timer
|
|
sudo systemctl status productier-backup.timer
|
|
```
|
|
|
|
Run backup manually through systemd:
|
|
|
|
```bash
|
|
sudo systemctl start productier-backup.service
|
|
sudo journalctl -u productier-backup.service -n 200 --no-pager
|
|
```
|
|
|
|
Retention is controlled by `BACKUP_KEEP_COUNT` in `productier-backup.service`.
|
|
|
|
Alerting:
|
|
|
|
- `OPS_ALERT_WEBHOOK_URL`
|
|
- `OPS_ALERT_WEBHOOK_BEARER_TOKEN`
|
|
- `OPS_NOTIFY_ON_SUCCESS`
|
|
- `OPS_ALERT_TIMEOUT_SECONDS`
|
|
|
|
These variables can be set in `/opt/productier/.env.production` and are loaded by `productier-backup.service`.
|
|
|
|
Restore drill automation:
|
|
|
|
- `infra/systemd/productier-restore-drill.service`
|
|
- `infra/systemd/productier-restore-drill.timer`
|
|
|
|
Example install:
|
|
|
|
```bash
|
|
sudo cp infra/systemd/productier-restore-drill.service /etc/systemd/system/
|
|
sudo cp infra/systemd/productier-restore-drill.timer /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable --now productier-restore-drill.timer
|
|
sudo systemctl status productier-restore-drill.timer
|
|
```
|