/ features

Not just CI/CD. AI that diagnoses your failures.

Every other CI tool tells you a test failed. Testhide tells you why — and opens the Jira ticket automatically.

8 AI/ML models

246+ REST API endpoints

5 platforms supported

50+ build & pipeline features

/ ai intelligence

8 trained models, zero cloud required.

All AI runs on your infrastructure. No data leaves your server. Every model was purpose-built for test failure analysis — not a general-purpose LLM wrapper.

DistilBERT

🎯

Root-Cause Classifier

Classifies every test failure as ENV_FAILURE, PRODUCT_BUG, or TEST_ISSUE with a confidence score. Stop debating what broke — know instantly.

NLP Real-time

HistGradient

🎲

Flakiness Predictor

Gradient-boosted model with 41 engineered features scores the probability a test is flaky. Quarantine bad tests automatically. Stop rerunning false alarms.

ML 41 features

FAISS + MLP

🔍

Failure Retriever

Contrastive MLP embeds failures, FAISS indexes them. When something breaks, instantly surface the 5 most similar past failures and their resolutions.

Vector search Semantic

Autoencoder + IF

🚨

Out-of-Distribution Detector

Autoencoder + IsolationForest detects when a failure pattern is genuinely novel — not a known recurring issue. Escalate the right things.

Anomaly detection

YOLOv8 + SSIM

🖼

Visual Diff Analyzer

YOLOv8 detects UI elements, SSIM measures structural similarity. Catch pixel regressions, layout shifts, and missing components automatically across every UI test.

Computer vision Screenshot diff

Drain3

📋

Log Signature Miner

Drain3 log parsing extracts structured templates from unstructured log streams. Groups similar error patterns across thousands of build logs into actionable signatures.

Log analysis Pattern mining

MiniLM + FAISS

🐛

Bug Linker

MiniLM embeds failure messages and searches Jira using embedding similarity. Links test failures to existing tickets automatically — no manual triage required.

Semantic search Jira

Clustering

📈

Emerging Issues Detector

Continuously clusters failure signatures over time. Surfaces new patterns before they become P1 incidents — catch the trend when it's 3 failures, not 300.

Proactive Trend detection

/ investigator agent

An LLM that debugs your CI failures for you.

The Investigator is a ReAct-loop agent with 14 specialized tools. Give it a failed build and it autonomously queries logs, diffs, test history, and Jira — then writes a root-cause summary.

→ 14 investigation tools: log fetch, diff compare, similar-failures search, Jira lookup
→ Pluggable LLM backend: cloud (Gemini, GPT-4) or local (Phi-3.5 via LLamaSharp GGUF)
→ Smart Context Builder prunes token budget — no truncated traces
→ Edge AI mode: runs entirely on-device, fully air-gapped
→ Code-Impact Analyzer links failures to the exact commit that caused them

Get started →

Investigator Agent · build #4821 ReAct

✓ fetch_logs(build=4821) → 3 942 lines

✓ search_similar_failures(top_k=5) → 3 matches

✓ get_jira_tickets(query="DB connection") → TH-2847

✓ code_impact_analysis(commit=a3f92c1) → 2 files

◉ generate_summary…

Root cause

DB connection pool exhausted after db_utils.py change in commit a3f92c1. Matches TH-2847 (ENV_FAILURE, 97% confidence). Fix: increase pool size or revert connection timeout patch.

/ build platform

A real CI/CD engine. Not a GitHub Actions wrapper.

Testhide is a self-hosted build platform with its own scheduler, agent pool, and pipeline runtime. You own the infrastructure — and the queue.

📋

8 step types

Script, Docker, dotnet test, pytest, unittest, Jest, JUnit, and custom — all natively parsed. Define every step in YAML or the visual job editor.

🔁

Matrix builds

Fan out the same job across Python versions, OS targets, or LLM models. Results aggregated into a single matrix timeline heatmap.

⚡

TPS parallel sharding

Tests-per-second aware sharding splits your suite across N agents so every shard finishes in the same wall-clock time. No more one slow shard blocking everything.

🕐

Cron scheduling

Schedule builds with full cron syntax. Nightly regression suites, weekly full-stack runs, pre-release sweeps — all managed in the platform.

🏷

Label & pool dispatch

Tag agents with labels (gpu, windows, staging). Route jobs to the right agent pool. "Fastest available node" dispatch keeps queues drained.

📡

Live log streaming

Server-sent events (SSE) push ANSI-rendered log lines to the browser in real time. Full-text search and collapsible step groups included.

📦

Build artifacts

Publish and download artifacts per build. Attach JUnit XML, coverage reports, screenshots, or binaries — retained according to your retention policy.

📥

Jenkins import

Import existing Jenkinsfile / Groovy pipelines. Automated conversion maps stages and steps to Testhide's job format — no manual rewrite.

🔮

ETA prediction

ML-backed build duration prediction shown while the build is still running. Know when it finishes before it does.

🔄

Disconnect recovery

Agent disconnects mid-build? The scheduler automatically reassigns the job to another available node without losing progress already streamed.

🖥

vCenter / ESXi control

Spin up VM agents on VMware vCenter or ESXi. Tear them down after the build. Full elastic compute — no idle VMs.

🔐

Secrets management

Encrypted secrets vault per project. Inject into any step as environment variables. Never exposed in logs or build output.

/ test intelligence

Smart test execution. Not just a test runner.

Parse, analyze, and optimize your test suite — across frameworks, automatically.

🔬

Multi-framework discovery

Automatic test discovery for pytest, unittest, JUnit, Jest, and Go — parse test results from any framework without configuration.

🚫

Flaky test quarantine

The flakiness predictor auto-quarantines high-score tests. They still run but their failure doesn't block the build. Re-graduate once stable.

🎯

Smart test selection

Code-impact analysis maps which tests are affected by a commit. Run only the relevant subset — 10× faster feedback on small PRs.

🔁

Intelligent flaky retry

Retry only tests predicted to be flaky — not the whole suite. Configurable retry budget per job, per test class, or globally.

🏗

Build pre-flight checks

Build guard runs pre-checks before starting: repo access, secret availability, agent capacity, disk space. Fail fast with a clear message.

📊

Test reports & history

Paginated test report list with per-test failure history, trend charts, and drill-down into individual assertions — going back as far as your data allows.

AI triage badges

Every test result is annotated with its root-cause class, flakiness score, OOD flag, and similar-failure count — visible directly in the test list, no click required.

Matrix timeline heatmap

See every matrix cell (env × config) on a single heatmap. Instantly spot which combination is failing without opening individual builds.

Dataset warehouse

All failure embeddings, log signatures, and predictions stored as Parquet. Query your own build history for custom analytics — no vendor lock-in.

YOLO fine-tuning

Bring your own UI component screenshots. Fine-tune the YOLOv8 visual diff model on your design system so it catches your specific regressions.

/ build agents

Lightweight agents. Every platform.

A single self-updating binary that runs as a background service or interactive app — on Windows, Linux, macOS, or Docker.

🪟

Windows

x64

🐧

Linux

x64 · deb · rpm

🍎

macOS

x64 · arm64

🐳

Docker

Compose · K8s

☁

Cloud VMs

ESXi · vCenter

🔄

Automatic self-update

The launcher binary wraps the client. When a new version ships to the CDN, the launcher downloads, verifies, swaps, and restarts — with automatic rollback on failure.

⚙

Service & app modes

Run as a daemon (Windows Service / systemd) for CI servers, or in interactive app mode for developer workstations. Switch at runtime — no reinstall.

📡

Resilient WebSocket

Jittered exponential backoff with configurable max retry. Agents survive network hiccups and server restarts without losing their build session.

🧠

On-device AI inference

LLamaSharp GGUF runtime runs local LLM inference on the agent machine. Air-gapped environments get full Investigator Agent capability without internet access.

🔒

Session isolation

On Windows, agents run in Session 0 (non-interactive service context) for full isolation from interactive desktop sessions. Correct for production CI.

🖱

Remote control

Remote mouse, keyboard, and screenshot capabilities built into the agent — useful for UI test orchestration without a separate VNC or RDP session.

🔎

Auto tool detection

Agent scans the host for pytest, Node, .NET, Java, Go, Docker — and makes them available to build steps automatically. No PATH fiddling in YAML.

🧹

Artifact cleanup

Configurable retention policy per agent. Old build artifacts pruned automatically so agents don't fill their disks between builds.

/ security & enterprise

Self-hosted. Compliant. Yours to control.

Every byte of your test data — logs, traces, failure embeddings — stays on your infrastructure. SOC 2 aligned, GDPR ready, LDAP/SSO supported.

🔐

Authentication

JWT session tokens (configurable TTL)
Personal Access Tokens with 15 permission scopes
LDAP / Active Directory integration
SSO (SAML 2.0 compatible)
IP allowlist and rate limiting per endpoint
Automatic IP ban on repeated auth failures

🛡

Authorization

RBAC with fine-grained permission matrix
Project-scoped roles: Viewer, Tester, Developer, Admin
PAT token scopes: build:read, build:write, agent:register, report:read…
Resource-level ACLs on jobs, nodes, and secrets
Audit log for all permission-sensitive actions

🏢

Infrastructure

Docker sidecar architecture — each service isolated
Nuitka-compiled protected server image (no source exposure)
Multi-replica horizontal scaling with Redis pub/sub
SOC 2 aligned data handling practices
GDPR compliant — no external analytics or telemetry
Air-gap compatible — runs fully offline with local AI models

/ integrations

Plugs into your entire toolchain.

🐛

Jira

Semantic bug linking — failures matched to existing issues via MiniLM embeddings. Auto-open tickets for novel failures. Bi-directional status sync.

💬

Microsoft Teams

Build status cards with failure summaries and root-cause labels pushed to any Teams channel. Smart deduplication — one notification per unique failure cluster.

⚡

Slack

Rich Slack blocks with build status, test counts, AI triage labels, and a direct link to the Investigator summary. Configurable per-project channels.

🔗

Webhooks

Custom HTTP webhooks on build start, completion, and failure. Full build payload in JSON — integrate with any tool that accepts webhooks.

📡

REST API v3

246+ endpoints covering builds, jobs, agents, test results, AI models, users, and reporting. WebSocket (56 message types) + SSE for real-time streaming.

🖥

Jenkins

Import existing Jenkinsfiles. Automated Groovy→YAML conversion so your existing pipeline knowledge isn't wasted.

🤖

LLM Providers

Gemini, GPT-4, and local Phi-3.5 (via LLamaSharp) for the Investigator Agent. Switch providers in config — no code changes.

📊

JUnit / xUnit Output

Standard JUnit XML output for every test run. Compatible with Jenkins, GitHub Actions summaries, Allure, ReportPortal, and any CI dashboard.

Full programmatic control via REST API v3

Every action available in the UI is also available via API. Trigger builds, query test history, manage agents, retrieve AI predictions, and pull reports — all from scripts or your own tooling.

246+ endpoints WebSocket SSE streaming OpenAPI schema

# Trigger a build via API
curl -X POST https://your-server/api/v3/builds/ \
  -H "Authorization: Bearer <PAT>" \
  -H "Content-Type: application/json" \
  -d '{"job_id": 42, "branch": "main"}'

# Stream live logs via SSE
curl https://your-server/api/v3/builds/4821/logs/stream/ \
  -H "Authorization: Bearer <PAT>"

/ comparison

How Testhide stacks up against alternatives.

Capability	Testhide	GitHub Actions	Jenkins	Braintrust / Langfuse
Self-hosted, data stays on-prem	✓	—	✓	Partial
AI root-cause classification	✓ (DistilBERT)	—	—	—
Flakiness prediction & quarantine	✓ (41 features)	—	—	—
Visual diff / screenshot regression	✓ (YOLOv8)	—	—	—
LLM Investigator Agent	✓ (14 tools)	—	—	—
Jira auto-linking (semantic)	✓	Plugin only	Plugin only	—
Matrix builds	✓	✓	Plugin	—
TPS parallel sharding	✓	—	—	—
LDAP / SSO / RBAC	✓	Org level only	✓	—
Air-gapped / offline AI	✓ (LLamaSharp)	—	—	—
vCenter / ESXi VM control	✓	—	Plugin	—
Free self-hosted tier	✓ unlimited	✓ (limited)	✓	—

/ get started

Everything ships in one Docker Compose. Five minutes.

No cloud account. No credit card. Self-host Testhide on any Linux server and connect your first agent in minutes.

Install with Docker → See pricing