Jobs Stuck in ENCODING (Orphaned Jobs)
Jobs Stuck in ENCODING (Orphaned Jobs)
Symptom
Jobs remain inENCODING status after system restart or crash. Progress bar frozen at last checkpoint.Cause
Worker process terminated unexpectedly before updating job status toCOMPLETED or FAILED. Common scenarios:Fix
- Automatic (On Startup)
- Manual Reset (UI)
- Database Query (Advanced)
BitBonsai automatically recovers orphaned jobs on backend startup:No action needed. Jobs will restart from beginning on next queue cycle.Verify recovery in logs:
The Stuck Job Watchdog (if enabled) automatically detects jobs with no progress updates for 30+ minutes and resets them.
NFS Mount Not Found (Child Nodes)
NFS Mount Not Found (Child Nodes)
Symptom
Child node logs show:Cause
Worker node cannot access shared storage via NFS mount. Common reasons:- NFS server is down
- Export path not configured correctly
- Network connectivity issue
- Mount point not created
Fix
Check NFS Exports
Verify shared directories are exported:If missing, add to Then reload:
/etc/exports:Check Firewall Rules
NFS requires these ports open on the main node:
- TCP/UDP 2049 (NFS)
- TCP/UDP 111 (portmapper)
Health Check Failures (CORRUPTED Jobs)
Health Check Failures (CORRUPTED Jobs)
Symptom
Jobs complete encoding but marked asCORRUPTED instead of COMPLETED. Error in logs:Cause
Post-encoding validation detected issues:- Output file corrupted during encoding
- File moved/deleted during health check
- NFS network interruption during validation
- FFprobe timeout or crash
Fix
Verify File Integrity
Check if output file actually exists and is playable:If file is valid, this is a false positive health check failure.
Manual Re-validation
Trigger health check retry:
- Navigate to Jobs page → Filter by
CORRUPTED - Select job(s)
- Click Actions → Re-validate Health
Automatic Hourly Retry
BitBonsai automatically re-checks Check logs to confirm auto-recovery:
CORRUPTED jobs every hour:Child Node Disconnected (Network Issues)
Child Node Disconnected (Network Issues)
Symptom
Child node shows asOFFLINE or DISCONNECTED in UI. Worker logs show:Cause
Worker node cannot reach main node API. Common reasons:- Network connectivity issue
- Firewall blocking port 3100
- Main node backend service down
- Invalid API key configuration
Fix
Frontend Can't Connect to Backend
Frontend Can't Connect to Backend
Symptom
Web UI shows loading spinner indefinitely or displays:Cause
Frontend cannot reach backend API. Possible reasons:- Backend container not running
- Port 3100 not exposed
- Incorrect
API_URLenvironment variable - CORS configuration issue (if accessing from different origin)
Fix
Check API_URL Configuration
Verify frontend knows where to find backend:Update Restart:
docker-compose.yml if incorrect:Verify Browser Network Tab
Open browser DevTools (F12) → Network tab:
- Refresh BitBonsai UI
- Look for failed API requests
- Check request URL matches backend address
- Check for CORS errors in console
- Wrong URL: Update
API_URLenvironment variable - CORS error: Add your frontend origin to backend CORS config
- ERR_CONNECTION_REFUSED: Backend not accessible from browser’s network
Database Connection Refused
Database Connection Refused
Symptom
Backend logs show:Cause
Backend cannot connect to PostgreSQL database. Possible reasons:- PostgreSQL container not running
- Incorrect
DATABASE_URLconnection string - Database initialization not complete
- Network issue between containers
Fix
Verify DATABASE_URL Correct
Check backend environment:Common mistakes:Restart:
- Hostname:
postgres(Docker service name), NOTlocalhost - Password: Must match
POSTGRES_PASSWORDin postgres service - Port:
5432(internal Docker network port)
docker-compose.yml if incorrect:Test Database Connection Manually
- Check credentials match
POSTGRES_USERandPOSTGRES_PASSWORD - Verify database
bitbonsaiexists:\lin psql
Out of Disk Space During Encoding
Out of Disk Space During Encoding
Symptom
Encoding fails with:Cause
Temporary directory ran out of space during encoding. FFmpeg creates temporary files that can be 1-2× original video size before final compression.Fix
Reduce Concurrent Jobs
Lower parallel job limit to reduce temp space usage:
- Navigate to Settings → Encoding
- Set Max Concurrent Jobs to lower value (e.g., 1-2 instead of 4)
- Save settings
Increase Temp Directory Size
Expand temp storage capacity:Option 1: Move to larger partitionOption 2: Add more disk space to existing partition
- Expand virtual disk (if VM/LXC)
- Add physical disk and extend volume group
- Clean up other files on same partition
Temp File Detection Failed (10 Retries)
Temp File Detection Failed (10 Retries)
Symptom
Encoding starts but immediately fails with:Cause
FFmpeg couldn’t create temporary file or BitBonsai couldn’t detect it. Possible reasons:- NFS mount delay (file created but not visible yet)
- Insufficient disk space
- Permission issues on temp directory
- Slow storage (HDD instead of SSD)
Fix
Check Disk Space
Verify temp directory has sufficient free space:If low on space:
- Clear old temp files:
rm -rf /tmp/bitbonsai/* - Reduce concurrent jobs (fewer jobs = less temp space used)
- Move temp directory to larger partition