/ installation

Self-hosted in under 15 minutes.

13 Docker services pulled from Docker Hub — application core + observability stack. One compose file. Your data stays on your infrastructure.

0

System requirements

Testhide runs 13 containers — backend × 2 replicas, AI inference + training workers, MongoDB, Redis, sidecar, plus a full observability stack (Prometheus + Grafana + Tempo + Loki + Alloy). Size your server accordingly.

Minimum
CPU8 cores
RAM32 GB
Disk100 GB SSD
OSLinux (Ubuntu 22.04+)
Docker24.0+ · Compose v2

AI Worker is capped at 12 GB and AI API at 6 GB. On 32 GB hosts, reduce AI_ASSIST_INFER_WORKERS and skip YOLO training with AI_YOLO_MIN_ANNOTATIONS=50.

Recommended
CPU16 cores
RAM64 GB
Disk500 GB SSD
OSLinux (Ubuntu 22.04+)
Docker24.0+ · Compose v2

Comfortable headroom for concurrent builds, AI training, observability retention, and log streaming via Alloy/Loki.

Memory per service
frontend256 MB
backend ×22.5 GB each
ai-api1.5–6 GB elastic
ai-worker5–12 GB elastic
mongo + redis3.7 GB
observability~1.8 GB total

AI API and AI Worker share an elastic memory contract via Redis — sum of their mem_limit can exceed physical RAM safely (anti-correlated workloads).

/ 13-service architecture
Edge
Nginx · Angular :80 · :443 · :7771
Application
Backend ×2 REST · WebSocket
AI API LLM · CLIP · FAISS
AI Worker Train · YOLO · Edge
Sidecar Docker socket proxy
Data
MongoDB 8.2 :27017
Redis 7 cache · pub/sub
Redis Insight admin UI (optional)
Observability
Prometheus metrics · :9090 (loopback)
Grafana dashboards · :3000 (HTTPS)
Tempo OTel traces · :4317-4318
Loki log storage · :3100
Alloy log/metric collector

C# Agents (.NET 6) run on your build hosts and connect to Backend via WebSocket — Windows, Linux, or macOS. A one-shot config-init container syncs baked observability configs into named volumes on every --force-recreate deploy.

1

Download docker-compose.yaml

All images are pre-built on Docker Hub — no source code needed. Create a deployment directory and save this file.

bash — prepare deployment directory
$mkdir -p /opt/testhide/{ssl,data/mongo,ssh_keys,sandbox_data,releases,monitoring_scripts}
$cd /opt/testhide
# Download docker-compose.yaml and .env template
$curl -O https://testhide.com/static/landing/docker-compose.yaml
$curl -O https://testhide.com/static/landing/.env.example && cp .env.example .env
View full docker-compose.yaml expand ↓
docker-compose.yaml
# ==========================================
# Testhide — Production Docker Compose (13 services)
# Images: hub.docker.com/u/thuesdays
# Download the full file via the button above.
# ==========================================

name: testhide

services:

  # ── Config-init (one-shot, syncs baked configs to named volumes) ─
  config-init:
    image: thuesdays/testhide-backend:latest
    container_name: testhide-config-init
    entrypoint: []
    command: ["bash", "/app/scripts/sync-configs-to-volumes.sh"]
    environment:
      - SKIP_LLM_CHECK=true
      - LOAD_AI_MODELS=false
      - RUN_AI_WORKER=false
      - DISABLE_CRON=true
    volumes:
      - grafana_dashboards:/mnt/grafana_dashboards
      - grafana_provisioning:/mnt/grafana_provisioning
      - loki_config:/mnt/loki_config
      - prometheus_config:/mnt/prometheus_config
      - tempo_config:/mnt/tempo_config
      - alloy_config:/mnt/alloy_config
    restart: "no"

  # ── Frontend (Nginx + Angular SPA, TLS termination) ─────────────
  frontend:
    container_name: testhide_frontend
    image: thuesdays/testhide-frontend:latest
    restart: unless-stopped
    ports: [ "80:80", "443:443", "7771:7771" ]
    env_file: [ .env ]
    environment:
      - CERT_FILE=${CERT_FILE}
      - CERT_KEY=${CERT_KEY}
    volumes:
      - ./ssl:/etc/ssl:ro
      - testhide_static_data:/usr/share/nginx/html/static:ro
    depends_on: [ backend, ai-api ]
    networks: [ testhide-net ]
    mem_limit: 256m

  # ── Backend (Python API + WebSocket, 2 replicas) ─────────────────
  backend:
    image: thuesdays/testhide-backend:latest
    restart: unless-stopped
    env_file: [ .env ]
    environment:
      - LOAD_AI_MODELS=false
      - RUN_AI_WORKER=false
      - ENVIRONMENT=production
      - SERVICE_NAME=testhide
      - BUILD_RCA_RETRAIN_THRESHOLD=9999999   # MUST NEVER train
      - LOG_LEVEL=${LOG_LEVEL:-INFO}
      - SIDECAR_URL=http://sidecar-docker:8081
      - CORS_ALLOWED_ORIGINS=${CORS_ALLOWED_ORIGINS:-${PUBLIC_URL}}
    volumes:
      - testhide_static_data:/app/static
      - ./releases:/app/releases
      - ./monitoring_scripts:/app/monitoring_scripts
      - ./sandbox_data:/app/sandbox_data
      - ./ssh_keys:/app/ssh_keys     # persists across deploys
      - ./docker-compose.yaml:/app/docker-compose.yaml:ro
      - testhide_hf_cache:/app/hf_cache
    depends_on:
      mongo: { condition: service_healthy }
      redis: { condition: service_healthy }
      sidecar-docker: { condition: service_healthy }
    networks: [ testhide-net, testhide-internal ]
    mem_limit: 2.5g
    deploy:
      replicas: 2
      resources:
        limits: { cpus: "2.0", memory: 2.5g }

  # ── AI API (LLM / CLIP / FAISS inference — HTTP only) ────────────
  # Elastic memory: 1.5 GB reservation, 6 GB limit. Anti-correlated
  # with ai-worker via Redis (ai:training:in_progress).
  ai-api:
    container_name: testhide_ai_api
    image: thuesdays/testhide-backend:latest
    env_file: [ .env ]
    environment:
      - LOAD_AI_MODELS=true
      - RUN_AI_WORKER=false
      - DISABLE_CRON=true
      - HF_HOME=/app/hf_cache
      - HF_HUB_OFFLINE=${HF_HUB_OFFLINE:-0}
      - RC_MODEL_NAME=${RC_MODEL_NAME:-distilbert-base-uncased}
      - CORS_ALLOWED_ORIGINS=${CORS_ALLOWED_ORIGINS:-${PUBLIC_URL}}
    volumes:
      - testhide_static_data:/app/static
      - testhide_hf_cache:/app/hf_cache
      - ./sandbox_data:/app/sandbox_data
      - ./releases:/app/releases
    depends_on:
      mongo: { condition: service_healthy }
      redis: { condition: service_healthy }
    networks: [ testhide-net ]
    mem_limit: 6g
    mem_reservation: 1500m

  # ── AI Worker (training, vectorization, Edge AI) ─────────────────
  # Elastic memory: 5 GB reservation, 12 GB limit.
  ai-worker:
    container_name: testhide_ai_worker
    image: thuesdays/testhide-backend:latest
    env_file: [ .env ]
    environment:
      - LOAD_AI_MODELS=true
      - RUN_AI_WORKER=true
      - DISABLE_CRON=true
      - AI_MEMORY_LIMIT_MB=${AI_MEMORY_LIMIT_MB:-7500}
      - AI_HARD_EXIT_PERCENT=${AI_HARD_EXIT_PERCENT:-88}
      - AI_LLM_CTX=${AI_LLM_CTX:-8192}
      - AI_MERGE_BATCH_SIZE=${AI_MERGE_BATCH_SIZE:-150}
      # AI Pipeline streaming (Phase 1 + Phase 2 §3.7)
      - AI_DATASET_SHARD_ROWS=${AI_DATASET_SHARD_ROWS:-50000}
      - AI_COMPACT_AFTER_SHARDS=${AI_COMPACT_AFTER_SHARDS:-32}
      - AI_BUDGET_SAFETY_FACTOR=${AI_BUDGET_SAFETY_FACTOR:-0.5}
      - AI_TRAINING_LOCK_TTL_SEC=${AI_TRAINING_LOCK_TTL_SEC:-3600}
      - AI_VECTORS_SIDECAR_ENABLED=${AI_VECTORS_SIDECAR_ENABLED:-1}
      - AI_YOLO_FORCE_CPU=${AI_YOLO_FORCE_CPU:-true}
      - HF_HOME=/app/hf_cache
      - RC_MODEL_NAME=${RC_MODEL_NAME:-distilbert-base-uncased}
    volumes:
      - testhide_static_data:/app/static
      - testhide_hf_cache:/app/hf_cache
      - ./sandbox_data:/app/sandbox_data
      - ./releases:/app/releases
    depends_on:
      mongo: { condition: service_healthy }
      redis: { condition: service_healthy }
    networks: [ testhide-net ]
    mem_limit: 12g
    mem_reservation: 5g

  # ── MongoDB 8 ──────────────────────────────────────────────────
  mongo:
    container_name: testhide_mongo
    image: mongo:8.2.3
    restart: unless-stopped
    command: ["mongod","--auth","--wiredTigerCacheSizeGB","2"]
    environment:
      - MONGO_INITDB_ROOT_USERNAME=${MONGO_USER}
      - MONGO_INITDB_ROOT_PASSWORD=${MONGO_PASS}
    ports: [ "27017:27017" ]
    volumes: [ "${MONGO_DATA_PATH}:/data/db" ]
    networks: [ testhide-net ]
    healthcheck:
      test: ["CMD","mongosh","--quiet","-u","${MONGO_USER}","-p","${MONGO_PASS}","--authenticationDatabase","admin","--eval","db.adminCommand('ping')"]
      interval: 30s · timeout: 5s · retries: 5
    mem_limit: 3g

  # ── Redis 7 ───────────────────────────────────────────────────
  redis:
    container_name: testhide_redis
    image: redis:7-alpine
    command: ["redis-server","--maxmemory","512mb","--maxmemory-policy","allkeys-lru","--appendonly","no","--requirepass","${REDIS_PASSWORD}"]
    networks: [ testhide-net ]
    mem_limit: 768m

  # ── Redis Insight (optional admin UI, proxied via nginx) ───────
  redisinsight:
    container_name: testhide_redisinsight
    image: redis/redisinsight:latest
    depends_on:
      redis: { condition: service_healthy }
    networks: [ testhide-net ]
    mem_limit: 256m

  # ── Docker Socket Sidecar (SEC-004) ────────────────────────────
  # Sidecar token auto-derived from JWT_SECRET via HMAC-SHA256.
  sidecar-docker:
    container_name: testhide_sidecar_docker
    image: thuesdays/testhide-sidecar:latest
    environment:
      - JWT_SECRET=${JWT_SECRET}
      - SIDECAR_ALLOWED_IMAGES=${SIDECAR_ALLOWED_IMAGES}
      - SIDECAR_ALLOWED_EXEC_PATTERNS=${SIDECAR_ALLOWED_EXEC_PATTERNS}
      - SIDECAR_ALLOWED_NETWORKS=${SIDECAR_ALLOWED_NETWORKS}
      - SIDECAR_PORT=8081
    volumes: [ "/var/run/docker.sock:/var/run/docker.sock" ]
    networks: [ testhide-internal ]
    mem_limit: 128m
    healthcheck:
      test: ["CMD","wget","-qO-","http://localhost:8081/health"]

  # ── Observability: Prometheus (metrics) ────────────────────────
  # Loopback bind by default. Override via PROMETHEUS_PORT_BIND.
  prometheus:
    image: prom/prometheus:v2.51.0
    container_name: testhide-prometheus
    depends_on:
      config-init: { condition: service_completed_successfully }
    volumes:
      - prometheus_config:/etc/prometheus:ro
      - prometheus_data:/prometheus
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.retention.time=30d
      - --web.enable-lifecycle
      - --web.enable-remote-write-receiver
    ports: [ "${PROMETHEUS_PORT_BIND:-127.0.0.1:9090}:9090" ]
    networks: [ testhide-net ]
    mem_limit: 512m

  # ── Observability: Grafana (HTTPS, dashboards) ─────────────────
  grafana:
    image: grafana/grafana:10.4.2
    container_name: testhide-grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_ADMIN_PASSWORD:?Set in .env}
      GF_SERVER_ROOT_URL: ${GRAFANA_ROOT_URL:-${PUBLIC_URL}:3000}
      GF_SERVER_PROTOCOL: https
      GF_SERVER_CERT_FILE: /etc/grafana/ssl/server.crt
      GF_SERVER_CERT_KEY: /etc/grafana/ssl/server.key
      GF_FEATURE_TOGGLES_ENABLE: traceqlEditor
      GF_AUTH_ANONYMOUS_ENABLED: "false"
    volumes:
      - grafana_provisioning:/etc/grafana/provisioning:ro
      - grafana_dashboards:/var/lib/grafana/dashboards:ro
      - grafana_data:/var/lib/grafana
      - ${GRAFANA_CERT_HOST_CRT:-./ssl/testhide.crt}:/etc/grafana/ssl/server.crt:ro
      - ${GRAFANA_CERT_HOST_KEY:-./ssl/testhide.key}:/etc/grafana/ssl/server.key:ro
    ports: [ "3000:3000" ]
    depends_on:
      config-init: { condition: service_completed_successfully }
      prometheus:  { condition: service_started }
      tempo:       { condition: service_started }
      loki:        { condition: service_started }
    networks: [ testhide-net ]
    mem_limit: 256m

  # ── Observability: Tempo (distributed traces, OTel) ────────────
  tempo:
    image: grafana/tempo:2.4.1
    container_name: testhide-tempo
    command: ["-config.file=/etc/tempo/tempo.yml"]
    depends_on:
      config-init: { condition: service_completed_successfully }
    volumes:
      - tempo_config:/etc/tempo:ro
      - tempo_data:/tmp/tempo
    ports: [ "4317:4317", "4318:4318", "3200:3200" ]
    networks: [ testhide-net ]
    mem_limit: 512m

  # ── Observability: Loki (log storage) ──────────────────────────
  loki:
    image: grafana/loki:2.9.4
    container_name: testhide-loki
    command: -config.file=/etc/loki/loki.yml
    depends_on:
      config-init: { condition: service_completed_successfully }
    volumes:
      - loki_config:/etc/loki:ro
      - loki_data:/loki
    ports: [ "3100:3100" ]
    networks: [ testhide-net ]
    mem_limit: 256m

  # ── Observability: Grafana Alloy (log/metric collector) ────────
  alloy:
    image: grafana/alloy:v1.5.1
    container_name: testhide-alloy
    volumes:
      - alloy_config:/etc/alloy:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - testhide_static_data:/var/log/testhide:ro
    ports: [ "12345:12345" ]
    depends_on:
      config-init: { condition: service_completed_successfully }
      loki:       { condition: service_started }
      prometheus: { condition: service_started }
    networks: [ testhide-net ]
    mem_limit: 256m

volumes:
  testhide_static_data:    { name: testhide_static_data }
  testhide_hf_cache:       { name: testhide_hf_cache }
  prometheus_data:         { name: testhide_prometheus_data }
  grafana_data:            { name: testhide_grafana_data }
  tempo_data:              { name: testhide_tempo_data }
  loki_data:               { name: testhide_loki_data }
  # Config volumes — populated by config-init from the image.
  grafana_dashboards:      { name: testhide_grafana_dashboards }
  grafana_provisioning:    { name: testhide_grafana_provisioning }
  loki_config:             { name: testhide_loki_config }
  prometheus_config:       { name: testhide_prometheus_config }
  tempo_config:            { name: testhide_tempo_config }
  alloy_config:            { name: testhide_alloy_config }

networks:
  testhide-net:      { name: testhide-net }
  testhide-internal: { driver: bridge, internal: true }
2

Configure .env

Edit .env — every variable is listed below. REQUIRED must be set before first boot. OPTIONAL have working defaults.

Domain & Frontend
PUBLIC_URL
REQUIRED
PUBLIC_URL=https://testhide.yourcompany.com
Your public-facing URL. Must match SSL cert domain exactly. Used by CORS_ALLOWED_ORIGINS, GRAFANA_ROOT_URL, Angular bundles.
API_URL
REQUIRED
API_URL=https://testhide.yourcompany.com:7771
Angular embeds this at build time. Port 7771 is the AI API endpoint.
WS_URL
REQUIRED
WS_URL=wss://testhide.yourcompany.com:7771
WebSocket endpoint for agent connections and real-time build logs.
FRONTEND_PUBLIC_URL
REQUIRED
FRONTEND_PUBLIC_URL=https://testhide.yourcompany.com
Used for generating absolute links in notifications and emails.
CORS_ALLOWED_ORIGINS
default: PUBLIC_URL
CORS_ALLOWED_ORIGINS=https://testhide.yourcompany.com,https://staging.example.com
Comma-separated allow-list. Defaults to PUBLIC_URL. Add staging/dev origins if needed. Previous wildcard default * raised a startup security warning.
ENVIRONMENT
PRODUCTION
DEBUG
default: production / true / false
ENVIRONMENT=production
PRODUCTION=true
DEBUG=false
Set ENVIRONMENT=production in all prod/staging deployments — SEC-002 refuses startup with a default JWT_SECRET in those modes. Keep DEBUG=false for security headers and to disable stacktrace responses.
TESTHIDE_TRUST_PROXY
default: 0
TESTHIDE_TRUST_PROXY=1
Backend sits behind nginx — set to 1 so client IPs are read from X-Forwarded-For (rate limits, ban store, IP allowlists work correctly). Default 0 uses the TCP peer.
INTERNAL_API_URL
INTERNAL_WS_URL
PORT
do not change
INTERNAL_API_URL=http://backend:8080
INTERNAL_WS_URL=ws://backend:8080
PORT=8080
Internal Docker service names. Only change if you rename services in the compose file.
Security
JWT_SECRET
REQUIRED
JWT_SECRET=← click to generate
Master secret for JWT tokens AND the Docker sidecar auth (HMAC-SHA256). Generate once, never rotate without a full redeploy.
python3 -c "import secrets; print(secrets.token_hex(32))"
SIDECAR_AUTH_TOKEN
auto-derived
# SIDECAR_AUTH_TOKEN= ← leave commented
Auto-derived from JWT_SECRET using HMAC-SHA256. Only override if you need an explicit token separate from JWT_SECRET.
SSL / TLS
CERT_FILE
CERT_KEY
REQUIRED
CERT_FILE=/etc/ssl/testhide.crt
CERT_KEY=/etc/ssl/testhide.key
Paths inside the container. Place your files in ./ssl/ on the host — that directory is bind-mounted to /etc/ssl.
USE_SSL
default: false
USE_SSL=false
Nginx handles SSL termination — keep this false. Only set true if you bypass Nginx and run the Python backend directly on HTTPS.
MongoDB
MONGO_USER
MONGO_PASS
REQUIRED
MONGO_USER=testhide_user
MONGO_PASS=your_strong_password
MongoDB root credentials. Set once — changing after first boot requires manual DB user update.
MONGO_DATA_PATH
REQUIRED
MONGO_DATA_PATH=/opt/testhide/data/mongo
Absolute path on the HOST where MongoDB stores data files. Must exist and be writable before first launch.
MONGO_HOST
MONGO_PORT
MONGO_DB_NAME
MONGO_AUTH_SOURCE
defaults shown
MONGO_HOST=mongo
MONGO_PORT=27017
MONGO_DB_NAME=testhide_database
MONGO_AUTH_SOURCE=testhide_database
Only change if using an external MongoDB instance outside the compose network.
Redis
REDIS_PASSWORD
REQUIRED
REDIS_PASSWORD=your_strong_redis_password
Redis is on the internal Docker network only, but always set a password in production.
REDIS_HOST
REDIS_PORT
REDIS_DB
defaults shown
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=0
Only change if using an external Redis instance.
Gunicorn (Python WSGI)
GUNICORN_WORKERS
default: 3
GUNICORN_WORKERS=3
Rule of thumb: (2 × CPU cores) + 1. Backend runs 2 replicas — total processes = replicas × workers.
GUNICORN_TIMEOUT
GUNICORN_GRACEFUL_TIMEOUT
GUNICORN_KEEPALIVE
GUNICORN_LOG_LEVEL
defaults shown
GUNICORN_TIMEOUT=120
GUNICORN_GRACEFUL_TIMEOUT=30
GUNICORN_KEEPALIVE=5
GUNICORN_LOG_LEVEL=info
Increase GUNICORN_TIMEOUT if you see 502 errors during large report uploads or long AI inference calls.
AI & Local LLM
AI_LLM_REPO
AI_LLM_FILE
defaults shown
AI_LLM_REPO=bartowski/Phi-3.5-mini-instruct-GGUF
AI_LLM_FILE=Phi-3.5-mini-instruct-Q5_K_M.gguf
Local LLM for diagnostic summaries. Q5_K_M ≈ 3.5 GB on disk. Switch to Q4_K_M to save ~800 MB on 32 GB servers.
AI_LLM_CTX
AI_LLM_THREADS
AI_LLM_GPU_LAYERS
defaults shown
AI_LLM_CTX=32768
AI_LLM_THREADS=8
AI_LLM_GPU_LAYERS=0
Set AI_LLM_THREADS to physical CPU core count. Reduce AI_LLM_CTX to 8192 on 32 GB servers to cut peak RAM by ~4 GB. GPU_LAYERS=0 = CPU-only.
RC_MODEL_NAME
default shown
RC_MODEL_NAME=distilbert-base-uncased
DistilBERT backbone for the Root-Cause Classifier. Must be identical in ai-api and ai-worker — they share the testhide_hf_cache volume.
HF_HUB_OFFLINE
TRANSFORMERS_OFFLINE
default: 0
HF_HUB_OFFLINE=0
TRANSFORMERS_OFFLINE=0
Set both to 1 after a successful first boot for air-gapped mode. Models persist in the testhide_hf_cache Docker volume between restarts.
AI Worker — Memory & Concurrency
AI_MEMORY_LIMIT_MB
AI_HARD_EXIT_PERCENT
defaults: 7500 / 88
AI_MEMORY_LIMIT_MB=7500
AI_HARD_EXIT_PERCENT=88
Two-layer OOM protection. Soft limit (MB): worker refuses new tasks above this RSS. Must stay BELOW Docker mem_limit=12g (12288 MB) so it fires first. Hard exit (% of container limit) triggers graceful shutdown if soft limit is missed.
AI_MERGE_BATCH_SIZE
AI_LLM_CTX
defaults: 150 / 8192
AI_MERGE_BATCH_SIZE=150
AI_LLM_CTX=8192
Pandas merge peak is ~24 MB per batch at 150 rows. LLM context window: 8192 saves ~1.5 GB KV-cache vs 32768 — recommended on 32 GB hosts.
AI_RETRAIN_QUEUE_BUSY_THRESHOLD
AI_LOCK_STALE_SECS
defaults: 200 / 300
AI_RETRAIN_QUEUE_BUSY_THRESHOLD=200
AI_LOCK_STALE_SECS=300
Skip retraining when the build queue is severely backlogged (200+ pending). Stale lock eviction prevents zombie training holds.
AI_ASSIST_INFER_WORKERS
AI_ASSIST_IO_WORKERS
AI_ASSIST_LLM_WORKERS
defaults: 4 / 4 / 1
AI_ASSIST_INFER_WORKERS=4
AI_ASSIST_IO_WORKERS=4
AI_ASSIST_LLM_WORKERS=1
Worker parallelism tuned for 6 CPU cores. Reduce infer/io to 2 on 32 GB servers if you see OOM events during heavy build load.
EDGE_AI_ENABLED
EDGE_AI_TIMEOUT_SEC
EDGE_LLM_MODEL
EDGE_EMBEDDER_MODEL
defaults shown
EDGE_AI_ENABLED=true
EDGE_AI_TIMEOUT_SEC=600
EDGE_LLM_MODEL=Phi-3.5-mini-instruct-Q5_K_M.gguf
EDGE_EMBEDDER_MODEL=minilm-l6-v2.onnx
Edge AI runs inference directly on agent nodes. Must match the LLM file name set in AI_LLM_FILE.
AI Pipeline — Streaming & Vector Sidecar (Phase 1 + §3.7)
AI_DATASET_SHARD_ROWS
AI_DATASET_MAX_ROWS
AI_COMPACT_AFTER_SHARDS
defaults: 50000 / 2000000 / 32
AI_DATASET_SHARD_ROWS=50000
AI_DATASET_MAX_ROWS=2000000
AI_COMPACT_AFTER_SHARDS=32
Training data is stored as Parquet shards (50k rows each). Old shards rotate out at AI_DATASET_MAX_ROWS. Compaction merges small shards every 32 new shards to keep file count bounded.
AI_BUDGET_SAFETY_FACTOR
AI_BUDGET_MIN_CHUNK_ROWS
AI_BUDGET_MAX_CHUNK_ROWS
defaults: 0.5 / 16 / 100000
AI_BUDGET_SAFETY_FACTOR=0.5
AI_BUDGET_MIN_CHUNK_ROWS=16
AI_BUDGET_MAX_CHUNK_ROWS=100000
Adaptive memory-budget controller: chunk sizes grow with headroom and shrink under pressure. safety_factor=0.5 means use up to 50% of available headroom per batch. Lower on tight hosts.
AI_TRAINING_LOCK_TTL_SEC
default: 3600
AI_TRAINING_LOCK_TTL_SEC=3600
Redis-coordinated training mutex TTL. Only one training session at a time. Set higher only if individual training runs exceed 1 hour.
AI_VECTORS_SIDECAR_ENABLED
default: 1
AI_VECTORS_SIDECAR_ENABLED=1
Vector sidecar memmap (Phase 2 §3.7). Extracts text_vector/image_vector from Parquet rows into raw float32 sidecar files. −60% dataset size, ×3-10 training speedup. On first deploy, migrate_vectors_to_sidecar.py runs automatically (one-shot, idempotent). Safety invariant (dataset_signature hash) blocks reads if shards diverge from sidecar. Set =0 to disable.
Build Investigator — Root-Cause ML
BUILD_RCA_ML_MODE
default: shadow
BUILD_RCA_ML_MODE=shadow
shadow — predictions persisted to db_ai_diagnostics but not shown (safe default). active — ML label used when confidence ≥ 0.6, replacing the regex baseline. disabled — model not loaded. Auto-promotes from shadow to active when shadow cron observes ≥ 70% agreement with regex baseline over 7 days and n ≥ 50 samples.
BUILD_RCA_RETRAIN_THRESHOLD
default: 200
BUILD_RCA_RETRAIN_THRESHOLD=200
Threshold on new RCA samples in db_rca_corpus before auto-retraining fires. Backend service overrides this to 9999999 in compose — training MUST run only in ai-worker (RUN_AI_WORKER=true).
Observability — Logging, Metrics, Traces
LOG_FORMAT
LOG_LEVEL
defaults: json / INFO
LOG_FORMAT=json
LOG_LEVEL=INFO
Structured JSON logs include trace_id/span_id so Loki + Tempo derive trace links automatically. Set LOG_LEVEL=DEBUG for verbose output (testhide loggers only — does not affect external libs).
METRICS_ENABLED
default: true
METRICS_ENABLED=true
Exposes /api/v3/metrics in Prometheus exposition format. 12 baseline metrics (HTTP rps + latency, build events, WS connections, LLM cost/tokens, AI precompute duration). Scraped every 30s by Prometheus.
OTEL_TRACES_ENABLED
OTEL_EXPORTER_OTLP_ENDPOINT
OTEL_SAMPLE_RATE
SERVICE_NAME
defaults shown
OTEL_TRACES_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318
OTEL_SAMPLE_RATE=0.1
SERVICE_NAME=testhide
OpenTelemetry traces. Tempo is reached via the docker network name (no port exposure). Sampling: 1.0 = 100% (dev), 0.1 = 10% (prod recommendation).
GRAFANA_ADMIN_PASSWORD
REQUIRED
GRAFANA_ADMIN_PASSWORD=← click to generate
Grafana admin password. Change before first deploy — compose refuses to start if unset. Generate: python3 -c "import secrets; print(secrets.token_urlsafe(32))"
GRAFANA_ROOT_URL
default: ${PUBLIC_URL}:3000
GRAFANA_ROOT_URL=https://testhide.yourcompany.com:3000
Grafana public URL — used in UI links and OAuth redirects. Defaults to ${PUBLIC_URL}:3000 if unset.
GRAFANA_CERT_HOST_CRT
GRAFANA_CERT_HOST_KEY
defaults: ./ssl/testhide.crt/.key
GRAFANA_CERT_HOST_CRT=./ssl/testhide.crt
GRAFANA_CERT_HOST_KEY=./ssl/testhide.key
Grafana SSL cert files (host paths relative to docker-compose.yaml). By default reuses the main testhide cert. Override only if Grafana needs a separate certificate.
PROMETHEUS_PORT_BIND
default: 127.0.0.1:9090
PROMETHEUS_PORT_BIND=127.0.0.1:9090
Loopback binding by default — Prometheus has no auth, so the UI must not be exposed externally. Reach it via SSH port-forward, or query through Grafana (uses the docker network). NEVER use 0.0.0.0:9090 in production.
Security — Extra Hardening
DB_DUMP_ALLOWED_IPS
fail-closed default
DB_DUMP_ALLOWED_IPS=10.0.0.0/8,127.0.0.1
SEC-014: CSV of IPs/CIDRs allowed to call /api/v3/db/dump. Fail-closed — leave empty and the endpoint denies every request with HTTP 403.
TESTHIDE_ALLOW_INSECURE_WEBHOOKS
default: 0
TESTHIDE_ALLOW_INSECURE_WEBHOOKS=0
SEC-011: only set to 1 if you have a documented reason (self-signed internal webhooks, on-prem MS Teams behind a private CA). Even then, the individual webhook row must also set allow_insecure_tls=True — both gates must agree before TLS verification is skipped.
YOLO Visual Regression
AI_YOLO_MODEL
AI_YOLO_FORCE_CPU
defaults shown
AI_YOLO_MODEL=yolov8n.pt
AI_YOLO_FORCE_CPU=true
CPU training is the default — no GPU required. yolov8n.pt is the nano model: fast, low memory. Use yolov8s.pt for better accuracy at the cost of ~2× RAM.
AI_YOLO_EPOCHS
AI_YOLO_BATCH
AI_YOLO_FREEZE
AI_YOLO_MIN_ANNOTATIONS
AI_YOLO_WEIGHT_CORRECTION
AI_YOLO_WEIGHT_MANUAL
defaults shown
AI_YOLO_EPOCHS=5
AI_YOLO_BATCH=8
AI_YOLO_FREEZE=10
AI_YOLO_MIN_ANNOTATIONS=10
AI_YOLO_WEIGHT_CORRECTION=10
AI_YOLO_WEIGHT_MANUAL=5
Increase AI_YOLO_MIN_ANNOTATIONS to skip training on small datasets. On 32 GB servers set to 50 to avoid OOM during training runs.
License
LICENSE_API_URL
default shown
LICENSE_API_URL=https://service.testhide.com/api/v1/
Leave as-is for cloud license validation. For air-gapped deployments, contact support for an offline key.
LICENSE_API_CA_BUNDLE
optional
# LICENSE_API_CA_BUNDLE=/etc/ssl/ldap-ca.pem
Only needed if your corporate egress proxy MitMs HTTPS. Point at the same CA bundle as LDAP_CA_BUNDLE.
Docker Sidecar (SEC-004)
SIDECAR_ALLOWED_IMAGES
defaults shown
SIDECAR_ALLOWED_IMAGES=thuesdays/testhide-agent:latest,thuesdays/testhide-backend:latest
Comma-separated allowlist — only these images can be started via the API. Add your custom agent image if using a custom build.
SIDECAR_ALLOWED_EXEC_PATTERNS
SIDECAR_ALLOWED_NETWORKS
defaults shown
SIDECAR_ALLOWED_EXEC_PATTERNS=^python\s,^bash\s,^sh\s
SIDECAR_ALLOWED_NETWORKS=testhide-internal,bridge,testhide-net
Restrict which exec commands and networks are allowed through the sidecar proxy. Tighten in high-security environments.
Integrations (all optional)
JIRA_URL
JIRA_USERNAME
JIRA_PASSWORD
BITBUCKET_API_VERSION
optional
JIRA_URL=https://jira.yourcompany.com/
JIRA_USERNAME=testhide-service
JIRA_PASSWORD=api_token
BITBUCKET_API_VERSION=1.0
Enables the Jira Auto-Linker ML model. Use a Jira API token (not password). Leave empty to disable.
LDAP_USE_SSL
LDAP_PORT
LDAP_CA_BUNDLE
LDAP_TLS_INSECURE
optional
LDAP_USE_SSL=1
LDAP_PORT=636
LDAP_CA_BUNDLE=/etc/ssl/ldap-ca.pem
LDAP_TLS_INSECURE=0
LDAPS on port 636 by default. Drop your CA PEM at ./ssl/ldap-ca.pem on the host. Set LDAP_TLS_INSECURE=1 only for local dev — never in production.
GOOGLE_AI_API_KEY
optional
GOOGLE_AI_API_KEY=your_gemini_api_key
Optional quality upgrade: Gemini is used as a fallback when local Phi-3.5 inference is busy. The local LLM runs without this key.
3

Place SSL certificates

Create a ./ssl/ directory next to docker-compose.yaml — it's bind-mounted read-only into Nginx.

bash
# Expected layout
$ls -la ssl/
-rw-r--r-- ssl/testhide.crt # certificate (PEM)
-rw------- ssl/testhide.key # private key (chmod 600)
-rw-r--r-- ssl/ldap-ca.pem # optional: LDAP / proxy CA bundle
$chmod 600 ssl/testhide.key
# Quick self-signed cert (dev only)
$openssl req -x509 -nodes -newkey rsa:4096 -days 365 \
  -keyout ssl/testhide.key -out ssl/testhide.crt \
  -subj "/CN=testhide.local" && chmod 600 ssl/testhide.key
4

Launch the stack

First run downloads ~4–6 GB of images and AI models. Allow 10–15 minutes.

bash
# First deploy: --force-recreate forces config-init to refresh observability configs
$docker compose up -d --force-recreate
[+] Running 14/14
Container testhide-config-init Exited (0) — configs synced
Container testhide_mongo Healthy
Container testhide_redis Healthy
Container testhide_redisinsight Started
Container testhide_sidecar_docker Healthy
Container testhide_backend Started (×2 replicas)
Container testhide_ai_api Started
Container testhide_ai_worker Started
Container testhide_frontend Started
Container testhide-prometheus Started
Container testhide-tempo Started
Container testhide-loki Started
Container testhide-grafana Started
Container testhide-alloy Started
# Watch startup (Ctrl+C to stop following)
$docker compose logs -f backend ai-api ai-worker
/ expected startup sequence
0:00config-init copies baked observability configs to named volumes, exits
0:05MongoDB + Redis healthchecks pass
0:15Sidecar healthy, backend replicas accept traffic
0:20Prometheus, Tempo, Loki, Alloy start scraping/collecting
0:30Nginx + Grafana ready — UI on :443, dashboards on :3000
1:00AI API loads DistilBERT + FAISS index
5:00+AI Worker downloads HuggingFace models on first boot (~3.5 GB)
10:00+Phi-3.5-mini LLM loaded — all AI diagnostics active
5

Connect build agents

A lightweight launcher wraps the agent binary, managing service lifecycle and auto-updates. Agents connect via WebSocket. Available for Windows, Linux, macOS, and Docker.

App mode — interactive
Runs in the current user session. Auto-starts on login. Console window visible. Good for developer workstations where someone is always logged in.
./testhide
Service mode — CI servers ←
Runs as a system service (SYSTEM on Windows, root on Linux). Starts at boot with no user login. Recommended for dedicated build machines and containers.
./testhide --daemon
bash — universal installer (auto-detects Linux · macOS)
# Detects OS — installs via apt · yum · brew · or binary download
$curl -fsSL https://dl.testhide.com/install.sh | sudo bash
PowerShell — run as Administrator
# 1. Download launcher
PS>Invoke-WebRequest https://dl.testhide.com/stable/testhide-win-x64.exe -OutFile testhide.exe
# 2. Configure: WebSocket URL + license key
PS>.\testhide.exe config --url wss://YOUR_DOMAIN:7771 --license-key YOUR_LICENSE_KEY
# 3a. Install as Windows Service (recommended for CI servers)
PS>.\testhide.exe install-service
Service "TesthideAgent" installed and started
Auto-start on boot · runs as SYSTEM · no login required
# 3b. Or run interactively (App mode — current user session)
PS>.\testhide.exe
Service name: TesthideAgent. Manage via services.msc or net stop TesthideAgent / sc.exe delete TesthideAgent.
Config stored at C:\ProgramData\Testhide\config.json. Verify with .\testhide.exe config --show.
bash — Ubuntu / Debian
# 1. Add APT repository
$curl -fsSL https://dl.testhide.com/apt/gpg.key | sudo apt-key add -
$echo "deb https://dl.testhide.com/apt stable main" | \
  sudo tee /etc/apt/sources.list.d/testhide.list
$sudo apt-get update && sudo apt-get install -y testhide
# 2. Configure
$sudo testhide config --url wss://YOUR_DOMAIN:7771 --license-key YOUR_LICENSE_KEY
# 3. Enable systemd service
$sudo systemctl enable --now testhide
testhide.service enabled and started
Restarts automatically on crash · logs via journalctl
Installs launcher + agent + systemd unit. Update later via sudo apt-get upgrade testhide.
Logs: journalctl -u testhide -f · Status: systemctl status testhide
bash — CentOS / RHEL / Fedora
# 1. Add YUM repository
$sudo tee /etc/yum.repos.d/testhide.repo <<'EOF'
[testhide]
name=Testhide Agent
baseurl=https://dl.testhide.com/rpm
enabled=1
gpgcheck=0
EOF
$sudo dnf install -y testhide
# 2. Configure
$sudo testhide config --url wss://YOUR_DOMAIN:7771 --license-key YOUR_LICENSE_KEY
# 3. Enable systemd service
$sudo systemctl enable --now testhide
testhide.service enabled and started
Installs launcher + agent + systemd unit. Update via sudo dnf upgrade testhide.
Logs: journalctl -u testhide -f · Status: systemctl status testhide
bash — any Linux x64 (manual / air-gapped)
# 1. Download launcher binary
$sudo mkdir -p /opt/testhide
$sudo curl -fsSL https://dl.testhide.com/stable/testhide-linux-x64 \
  -o /opt/testhide/testhide && sudo chmod +x /opt/testhide/testhide
# 2. Configure
$sudo /opt/testhide/testhide config \
  --url wss://YOUR_DOMAIN:7771 \
  --license-key YOUR_LICENSE_KEY
# 3. Create systemd service
$sudo tee /etc/systemd/system/testhide.service <<'EOF'
[Unit]
Description=Testhide CI/CD Agent
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
Environment=TESTHIDE_SERVICE_MODE=1
ExecStart=/opt/testhide/testhide --daemon
WorkingDirectory=/opt/testhide
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF
$sudo systemctl daemon-reload && sudo systemctl enable --now testhide
Launcher downloads the agent binary on first run to ~/.testhide/client/{version}/ from dl.testhide.com.
Config stored at /root/.testhide/config.json when running as root.
bash — macOS Intel x64 · Apple Silicon arm64
# Auto-detect architecture and download
$ARCH=$(uname -m | sed 's/x86_64/x64/;s/arm64/arm64/') \
  && curl -fsSL https://dl.testhide.com/stable/testhide-macos-$ARCH \
  -o testhide && chmod +x testhide
# Configure
$./testhide config --url wss://YOUR_DOMAIN:7771 --license-key YOUR_LICENSE_KEY
# Run (App mode)
$./testhide
If macOS blocks the binary on first run: xattr -d com.apple.quarantine ./testhide or allow via System Settings → Privacy & Security.
Downloads: testhide-macos-x64 (Intel) · testhide-macos-arm64 (M1/M2/M3)
bash — Docker ephemeral agent
# Run agent container
$docker run -d \
  --name testhide-agent \
  --restart unless-stopped \
  -e TESTHIDE_SERVICE_MODE=1 \
  -e TESTHIDE_NODE_TYPE=dynamic \
  thuesdays/testhide-agent:latest
# Configure inside the container
$docker exec testhide-agent testhide config \
  --url wss://YOUR_DOMAIN:7771 \
  --license-key YOUR_LICENSE_KEY
$docker restart testhide-agent
# Logs
$docker logs -f testhide-agent
# Co-locate in your server docker-compose.yaml:
# image: thuesdays/testhide-agent:latest
# environment: [TESTHIDE_SERVICE_MODE=1, TESTHIDE_NODE_TYPE=dynamic]
# depends_on: [backend]
# networks: [testhide-net]
Image: thuesdays/testhide-agent:latest · Base: Python 3.12 slim · Includes SSH server for backend remote-terminal access during test runs.
Add -v /var/run/docker.sock:/var/run/docker.sock only if your tests need to spawn Docker containers (Docker-in-Docker).
testhide config — flag reference
--url / -u
REQUIRED
--url wss://testhide.yourcompany.com:7771
WebSocket server URL. Must start with ws:// or wss://. Must match WS_URL in your server .env.
--license-key / -k
REQUIRED
--license-key YOUR_LICENSE_KEY
License key from your Testhide dashboard. Stored as a SHA-256 hash — never written to disk in plain text.
--instance-id / -i
optional
--instance-id build-server-01
Human-readable node name shown in the Agents dashboard. Defaults to machine hostname if not set.
--show / -s
read-only
testhide config --show
Print current configuration to stdout without making any changes. Useful for verifying before first run.
--license-url / -l
optional
--license-url https://service.testhide.com/api/v1/
Override the license validation endpoint. Only needed for air-gapped deployments — contact support for an offline license key.
install-service
uninstall-service
Windows only
.\testhide.exe install-service
.\testhide.exe uninstall-service
Install or remove the TesthideAgent Windows Service. Requires Administrator. The launcher installs itself as a service, auto-starts on boot.
--daemon
Linux / macOS
testhide --daemon
Run in daemon (service) mode. Sets TESTHIDE_SERVICE_MODE=1 internally. Used in the systemd ExecStart line — do not use for interactive sessions.
Find your license key: Dashboard → Settings → License after first login.
Agents auto-update hourly from dl.testhide.com. The launcher rolls back to the previous version on repeated crashes and re-downloads if needed.
6

Verify & health-check

Run these before connecting your first pipeline.

bash
$docker compose ps
# Backend health
$curl -sk https://localhost/api/v3/health | python3 -m json.tool
{"status": "ok", "version": "6.1.6", "mongo": "connected", "redis": "connected"}
# AI API health
$curl -sk https://localhost:7771/api/v3/ai/health | python3 -m json.tool
{"status": "ok", "models_loaded": true, "llm": "phi-3.5-mini"}
# Prometheus metrics endpoint
$curl -sk https://localhost/api/v3/metrics | head -20
# HELP http_requests_total Total HTTP requests by route
http_requests_total{method="GET",route="/api/v3/health",status="200"} 12
...
# Open Grafana in your browser — admin / GRAFANA_ADMIN_PASSWORD from .env
$open https://localhost:3000 # or via your domain

Upgrades & operations

bash
# Pull latest images + restart. --force-recreate re-runs config-init so
# observability configs (Grafana dashboards etc.) refresh from the image.
$docker compose pull && docker compose up -d --force-recreate
# Backup MongoDB
$docker exec testhide_mongo mongodump \
  -u $MONGO_USER -p $MONGO_PASS \
  --authenticationDatabase admin \
  --out /data/db/backup_$(date +%Y%m%d)
# Follow AI worker logs (or use Grafana → Loki for searchable history)
$docker logs -f testhide_ai_worker
# Restart only one service (no full stack downtime)
$docker compose up -d --force-recreate --no-deps ai-worker

Rather skip the infrastructure? We run it for you.

Cloud Starter ($49/mo) — managed instance, 3 concurrent builds, 30-day retention, full 8-model dashboard.