life-my--midst--in

Troubleshooting Guide

Common errors, debugging strategies, and solutions for the In Midst My Life platform.


Table of Contents


Quick Diagnostics

Health Check Script

Run this first to diagnose system health:

#!/bin/bash
# save as: scripts/health-check.sh

echo "🔍 Running system diagnostics..."

echo "\n1. Checking services..."
docker-compose ps

echo "\n2. Testing API health..."
curl -s http://localhost:3001/health || echo "❌ API unreachable"
curl -s http://localhost:3001/ready || echo "❌ API not ready"

echo "\n3. Testing database connection..."
docker-compose exec -T postgres pg_isready -U midstsvc || echo "❌ PostgreSQL not ready"

echo "\n4. Testing Redis..."
docker-compose exec -T redis redis-cli ping || echo "❌ Redis not responding"

echo "\n5. Checking logs for errors..."
docker-compose logs --tail=50 api | grep -i error
docker-compose logs --tail=50 orchestrator | grep -i error

echo "\n✅ Diagnostics complete"

Common Commands

# View all container status
docker-compose ps

# Check service logs
docker-compose logs -f api
docker-compose logs -f orchestrator
docker-compose logs --tail=100 api

# Restart specific service
docker-compose restart api

# Full restart
docker-compose down && docker-compose up -d

# Check port usage
lsof -i :3001
lsof -i :5432

Database Issues

Error: ECONNREFUSED - Cannot Connect to Database

Symptoms:

Error: connect ECONNREFUSED 127.0.0.1:5432

Diagnosis:

# Check if PostgreSQL is running
docker-compose ps postgres

# Check PostgreSQL logs
docker-compose logs postgres

Solutions:

  1. PostgreSQL not started:
    ./scripts/dev-up.sh
    # or
    docker-compose up postgres -d
    
  2. Wrong connection string:
    # Check environment variable
    echo $DATABASE_URL
       
    # Should be:
    postgresql://midstsvc:password@localhost:5432/midst_dev
       
    # For Docker services:
    postgresql://midstsvc:password@postgres:5432/midst
    
  3. Port conflict:
    # Check what's using port 5432
    lsof -i :5432
       
    # Change port in .env
    POSTGRES_PORT=5433
       
    # Update DATABASE_URL accordingly
    
  4. Container networking issue:
    # Verify network
    docker network ls
    docker network inspect in-midst-my-life_default
       
    # Recreate network
    docker-compose down
    docker-compose up -d
    

Error: relation "profiles" does not exist

Symptoms:

ERROR: relation "profiles" does not exist

Diagnosis:

# Connect to database
./scripts/dev-shell.sh postgres

# In psql:
\dt                    # List all tables

Solution:

# Run migrations
pnpm --filter @in-midst-my-life/api migrate
pnpm --filter @in-midst-my-life/orchestrator migrate

# Verify tables exist
./scripts/dev-shell.sh postgres
# In psql:
\dt
SELECT COUNT(*) FROM profiles;

Error: Migration Failed

Symptoms:

Migration failed: duplicate column name

Diagnosis:

# Check migration status
./scripts/dev-shell.sh postgres
# In psql:
SELECT * FROM migrations ORDER BY applied_at DESC;

Solutions:

  1. Migration already applied:
    • Migrations are idempotent by design
    • Check if table/column already exists
  2. Partial migration:
    # Rollback (if DOWN statements exist)
    # Edit migration file with proper DOWN
       
    # Re-run migration
    pnpm --filter @in-midst-my-life/api migrate
    
  3. Database in bad state:
    # DANGER: Nuclear option (dev only)
    docker-compose down -v  # Removes volumes
    docker-compose up postgres -d
    pnpm --filter @in-midst-my-life/api migrate
    pnpm --filter @in-midst-my-life/api seed
    

Error: too many connections

Symptoms:

FATAL: sorry, too many clients already

Solutions:

  1. Close unused connections:
    ./scripts/dev-shell.sh postgres
    # In psql:
    SELECT COUNT(*) FROM pg_stat_activity;
       
    # Kill idle connections
    SELECT pg_terminate_backend(pid)
    FROM pg_stat_activity
    WHERE state = 'idle'
    AND query_start < NOW() - INTERVAL '5 minutes';
    
  2. Increase max_connections:
    # docker-compose.yml
    postgres:
      command: postgres -c max_connections=200
    
  3. Connection pooling:
    • Check API uses connection pooling (default in our setup)
    • Verify pool size is appropriate (default: 10)

Redis & Caching Issues

Error: Redis connection failed

Symptoms:

Error: Redis connection to redis:6379 failed

Diagnosis:

# Check Redis status
docker-compose ps redis

# Test connection
docker-compose exec redis redis-cli ping

Solutions:

  1. Redis not started:
    ./scripts/dev-up.sh
    # or
    docker-compose up redis -d
    
  2. Wrong Redis URL:
    echo $REDIS_URL
       
    # Should be:
    redis://redis:6379          # From Docker containers
    redis://localhost:6379      # From host machine
    
  3. Redis auth required but not provided:
    # If Redis has password
    redis://:<password>@redis:6379
    

Error: Cache Inconsistency

Symptoms:

Solutions:

  1. Clear cache manually:
    ./scripts/dev-shell.sh redis
    # In redis-cli:
    FLUSHDB         # Clear current database
    FLUSHALL        # Clear all databases
    
  2. Clear specific keys:
    ./scripts/dev-shell.sh redis
    # In redis-cli:
    KEYS taxonomy:*  # Find taxonomy keys
    DEL taxonomy:masks taxonomy:epochs taxonomy:stages
    
  3. Disable caching (dev only):
    # Set in .env
    REDIS_URL=  # Empty = use in-memory fallback
    

API Errors

Error: 401 Unauthorized

Symptoms:

{
  "error": {
    "code": "UNAUTHORIZED",
    "message": "Missing or invalid authentication token"
  }
}

Solutions:

  1. Missing Authorization header:
    # Include JWT token
    curl -H "Authorization: Bearer <your-jwt-token>" \
      http://localhost:3001/profiles
    
  2. Expired token:
    • Obtain new token from auth provider
    • Check token expiry: https://jwt.io
  3. Invalid token signature:
    • Verify JWT_SECRET matches between services
    • Check token issuer/audience claims

Error: 404 Not Found

Symptoms:

{
  "error": {
    "code": "NOT_FOUND",
    "message": "Profile not found"
  }
}

Diagnosis:

# Check if resource exists in database
./scripts/dev-shell.sh postgres
# In psql:
SELECT * FROM profiles WHERE id = '<uuid>';

Solutions:

  1. Resource doesn’t exist:
    • Create the resource first
    • Check if ID is correct (UUID format)
  2. Soft-deleted:
    SELECT * FROM profiles WHERE id = '<uuid>' AND is_active = true;
    

Error: 429 Too Many Requests / QUOTA_EXCEEDED

Symptoms:

{
  "error": {
    "code": "QUOTA_EXCEEDED",
    "message": "Monthly quota exceeded for this feature",
    "details": {
      "feature": "resume_tailoring",
      "limit": 5,
      "used": 5,
      "resetDate": "2025-02-01T00:00:00Z"
    }
  }
}

Solutions:

  1. Upgrade tier:
    • FREE → PRO → ENTERPRISE
  2. Wait for quota reset:
    • Check resetDate in error response
  3. Bypass in development:
    # Set env var (dev only)
    DISABLE_RATE_LIMITING=true
    

Error: 500 Internal Server Error

Symptoms:

{
  "error": {
    "code": "INTERNAL_ERROR",
    "message": "An unexpected error occurred"
  }
}

Diagnosis:

# Check API logs
docker-compose logs api | tail -100

# Look for stack traces
docker-compose logs api | grep -A 20 "Error:"

Solutions:

  1. Database connection issue:
  2. Uncaught exception:
    • Check logs for stack trace
    • File bug report with reproduction steps
  3. Resource exhaustion:
    • Check memory/CPU usage
    • Restart service: docker-compose restart api

Orchestrator & Job Queue Issues

Error: Jobs Not Processing

Symptoms:

Diagnosis:

# Check orchestrator is running
docker-compose ps orchestrator

# Check worker is enabled
docker-compose exec orchestrator printenv | grep ORCH_WORKER_ENABLED

# Check Redis queue length
./scripts/dev-shell.sh redis
# In redis-cli:
LLEN "bull:task-queue:waiting"
LLEN "bull:task-queue:active"
LLEN "bull:task-queue:failed"

Solutions:

  1. Worker not enabled:
    # Set in .env
    ORCH_WORKER_ENABLED=true
       
    # Restart orchestrator
    docker-compose restart orchestrator
    
  2. Redis connection issue:
    # Check REDIS_URL or ORCH_REDIS_URL
    echo $ORCH_REDIS_URL
       
    # Should be:
    redis://redis:6379
    
  3. Job handler not registered:
    # Check orchestrator logs
    docker-compose logs orchestrator | grep "Registering handler"
       
    # Ensure task type matches registered handlers
    
  4. Job failing silently:
    # Check failed queue
    ./scripts/dev-shell.sh redis
    # In redis-cli:
    LRANGE "bull:task-queue:failed" 0 -1
    

Error: LLM Agent Timeout

Symptoms:

Error: LLM request timeout after 30s

Solutions:

  1. Local LLM not running:
    # Check Ollama is running
    curl http://localhost:11434/api/tags
       
    # Start Ollama
    ollama serve
       
    # Pull model if needed
    ollama pull llama3.1:8b
    
  2. Wrong LLM URL:
    # For local development
    LOCAL_LLM_URL=http://localhost:11434
       
    # For Docker containers
    LOCAL_LLM_URL=http://host.docker.internal:11434
    LOCAL_LLM_ALLOWED_HOSTS=host.docker.internal
    
  3. Use stub executor (no LLM):
    # Set in .env
    ORCH_AGENT_EXECUTOR=stub
    

Frontend Issues

Error: hydration failed in Next.js

Symptoms:

Error: Hydration failed because the initial UI does not match what was rendered on the server.

Solutions:

  1. Client-only rendering:
    'use client';
       
    import dynamic from 'next/dynamic';
       
    const ClientComponent = dynamic(() => import('./ClientComponent'), {
      ssr: false,
    });
    
  2. Date/time formatting:
    // Use consistent formatting
    const date = new Date(dateString).toISOString();
    
  3. Browser extensions:
    • Disable browser extensions
    • Test in incognito mode

Error: API calls failing from frontend

Symptoms:

Failed to fetch: net::ERR_CONNECTION_REFUSED

Diagnosis:

# Check API is running
curl http://localhost:3001/health

# Check NEXT_PUBLIC_API_BASE_URL
echo $NEXT_PUBLIC_API_BASE_URL

Solutions:

  1. API not running:
    pnpm --filter @in-midst-my-life/api dev
    
  2. Wrong API URL:
    # Set in .env
    NEXT_PUBLIC_API_BASE_URL=http://localhost:3001
    
  3. CORS issue:
    • Check API CORS configuration
    • Ensure origin is allowed

Error: D3 Graph Not Rendering

Symptoms:

Solutions:

  1. Container ref not attached:
    const containerRef = useRef<HTMLDivElement>(null);
       
    useEffect(() => {
      if (!containerRef.current) return;
      // ... D3 code
    }, []);
       
    return <div ref={containerRef} />;
    
  2. Dynamic import for D3:
    const D3Graph = dynamic(() => import('./D3Graph'), {
      ssr: false,
    });
    
  3. Fallback to radial layout:
    # Set in .env
    NEXT_PUBLIC_GRAPH_LAYOUT=radial
    

Development Environment

Error: pnpm install fails

Symptoms:

ERR_PNPM_LOCKFILE_BROKEN_NODE_MODULES

Solutions:

# Clean install
rm -rf node_modules
rm pnpm-lock.yaml
pnpm install

# If still failing, clear pnpm cache
pnpm store prune
pnpm install

Error: TypeScript errors after update

Symptoms:

Type 'X' is not assignable to type 'Y'

Solutions:

# Rebuild all packages
pnpm build

# Clear TypeScript cache
rm tsconfig.tsbuildinfo
rm -rf apps/*/tsconfig.tsbuildinfo
rm -rf packages/*/tsconfig.tsbuildinfo

# Re-run typecheck
pnpm typecheck

Error: Port already in use

Symptoms:

Error: listen EADDRINUSE: address already in use :::3001

Solutions:

# Find process using port
lsof -i :3001

# Kill process
kill -9 <PID>

# Or change port
# In .env:
API_PORT=3011

Deployment Issues

Error: Kubernetes Pod CrashLoopBackOff

Diagnosis:

# Check pod status
kubectl get pods -n inmidst

# View pod logs
kubectl logs <pod-name> -n inmidst

# Describe pod for events
kubectl describe pod <pod-name> -n inmidst

Common Causes:

  1. Missing environment variables:
    # Check pod environment
    kubectl exec <pod-name> -n inmidst -- printenv
    
  2. Database connection failure:
    • Check DATABASE_URL secret
    • Verify database is accessible from cluster
  3. Image pull error:
    # Check image exists
    docker pull <image-name>:<tag>
       
    # Check image pull secrets
    kubectl get secrets -n inmidst
    

Error: Helm Install Failed

Diagnosis:

# Check Helm release status
helm status inmidst -n inmidst

# View release history
helm history inmidst -n inmidst

# Get error details
helm get notes inmidst -n inmidst

Solutions:

  1. Rollback to previous version:
    helm rollback inmidst -n inmidst
    
  2. Uninstall and reinstall:
    helm uninstall inmidst -n inmidst
    helm install inmidst . -n inmidst -f values.yaml
    
  3. Debug with dry-run:
    helm install inmidst . --dry-run --debug -n inmidst
    

Performance Issues

Issue: Slow API Response Times

Diagnosis:

# Check API metrics
curl http://localhost:3001/metrics | grep http_request_duration

# Test specific endpoint
time curl http://localhost:3001/profiles/<id>

Solutions:

  1. Database query optimization:
    # Enable query logging
    # In postgresql.conf:
    log_statement = 'all'
    log_duration = on
       
    # Check slow queries
    SELECT * FROM pg_stat_statements
    ORDER BY total_time DESC
    LIMIT 10;
    
  2. Add database indexes:
    CREATE INDEX idx_profiles_slug ON profiles(slug);
    CREATE INDEX idx_experiences_profile_id ON experiences(profile_id);
    
  3. Enable Redis caching:
    # Ensure REDIS_URL is set
    echo $REDIS_URL
       
    # Test Redis connection
    ./scripts/dev-shell.sh redis
    

Issue: High Memory Usage

Diagnosis:

# Check memory usage
docker stats

# Kubernetes
kubectl top pods -n inmidst

Solutions:

  1. Increase memory limits:
    # docker-compose.yml
    api:
      deploy:
        resources:
          limits:
            memory: 2G
    
  2. Connection pool tuning:
    // Reduce pool size
    const pool = new Pool({
      max: 5, // instead of 10
    });
    

LLM & Agent Issues

Error: LLM returns invalid JSON

Symptoms:

Error: Unexpected token in JSON at position 0

Solutions:

  1. Use simpler model:
    # Switch to smaller model
    LOCAL_LLM_MODEL=gemma3:4b
    
  2. Disable structured output:
    # Use text mode
    ORCH_LLM_RESPONSE_FORMAT=text
    
  3. Retry with exponential backoff:
    • Already implemented in agent executor

Error: Agent tool not found

Symptoms:

Error: Tool 'rg' not in allowlist

Solutions:

# Enable tool in allowlist
ORCH_TOOL_ALLOWLIST=rg,ls,cat

# Or disable tools entirely
ORCH_TOOL_ALLOWLIST=  # Empty = no tools

Getting More Help

Enable Debug Logging

# API
LOG_LEVEL=debug pnpm --filter @in-midst-my-life/api dev

# Orchestrator
LOG_LEVEL=debug pnpm --filter @in-midst-my-life/orchestrator dev

# Docker Compose
docker-compose up --verbose

Collect Diagnostic Information

# Create bug report bundle
./scripts/collect-diagnostics.sh > diagnostics.txt

# Includes:
# - Service status
# - Recent logs
# - Environment config (secrets redacted)
# - Database table counts
# - Redis stats

Contact Support

When reporting issues, include:

  1. Error message (full stack trace)
  2. Steps to reproduce
  3. Environment (Docker/Kubernetes, OS, Node version)
  4. Relevant logs
  5. Diagnostic output

Additional Resources