/ features
Not just CI/CD. AI that diagnoses your failures.
Every other CI tool tells you a test failed. Testhide tells you why — and opens the Jira ticket automatically.
/ ai intelligence
8 trained models, zero cloud required.
All AI runs on your infrastructure. No data leaves your server. Every model was purpose-built for test failure analysis — not a general-purpose LLM wrapper.
Classifies every test failure as ENV_FAILURE, PRODUCT_BUG, or TEST_ISSUE with a confidence score. Stop debating what broke — know instantly.
Gradient-boosted model with 41 engineered features scores the probability a test is flaky. Quarantine bad tests automatically. Stop rerunning false alarms.
Contrastive MLP embeds failures, FAISS indexes them. When something breaks, instantly surface the 5 most similar past failures and their resolutions.
Autoencoder + IsolationForest detects when a failure pattern is genuinely novel — not a known recurring issue. Escalate the right things.
YOLOv8 detects UI elements, SSIM measures structural similarity. Catch pixel regressions, layout shifts, and missing components automatically across every UI test.
Drain3 log parsing extracts structured templates from unstructured log streams. Groups similar error patterns across thousands of build logs into actionable signatures.
MiniLM embeds failure messages and searches Jira using embedding similarity. Links test failures to existing tickets automatically — no manual triage required.
Continuously clusters failure signatures over time. Surfaces new patterns before they become P1 incidents — catch the trend when it's 3 failures, not 300.
/ investigator agent
An LLM that debugs your CI failures for you.
The Investigator is a ReAct-loop agent with 14 specialized tools. Give it a failed build and it autonomously queries logs, diffs, test history, and Jira — then writes a root-cause summary.
- 14 investigation tools: log fetch, diff compare, similar-failures search, Jira lookup
- Pluggable LLM backend: cloud (Gemini, GPT-4) or local (Phi-3.5 via LLamaSharp GGUF)
- Smart Context Builder prunes token budget — no truncated traces
- Edge AI mode: runs entirely on-device, fully air-gapped
- Code-Impact Analyzer links failures to the exact commit that caused them
db_utils.py change in commit a3f92c1. Matches TH-2847 (ENV_FAILURE, 97% confidence). Fix: increase pool size or revert connection timeout patch./ build platform
A real CI/CD engine. Not a GitHub Actions wrapper.
Testhide is a self-hosted build platform with its own scheduler, agent pool, and pipeline runtime. You own the infrastructure — and the queue.
Script, Docker, dotnet test, pytest, unittest, Jest, JUnit, and custom — all natively parsed. Define every step in YAML or the visual job editor.
Fan out the same job across Python versions, OS targets, or LLM models. Results aggregated into a single matrix timeline heatmap.
Tests-per-second aware sharding splits your suite across N agents so every shard finishes in the same wall-clock time. No more one slow shard blocking everything.
Schedule builds with full cron syntax. Nightly regression suites, weekly full-stack runs, pre-release sweeps — all managed in the platform.
Tag agents with labels (gpu, windows, staging). Route jobs to the right agent pool. "Fastest available node" dispatch keeps queues drained.
Server-sent events (SSE) push ANSI-rendered log lines to the browser in real time. Full-text search and collapsible step groups included.
Publish and download artifacts per build. Attach JUnit XML, coverage reports, screenshots, or binaries — retained according to your retention policy.
Import existing Jenkinsfile / Groovy pipelines. Automated conversion maps stages and steps to Testhide's job format — no manual rewrite.
ML-backed build duration prediction shown while the build is still running. Know when it finishes before it does.
Agent disconnects mid-build? The scheduler automatically reassigns the job to another available node without losing progress already streamed.
Spin up VM agents on VMware vCenter or ESXi. Tear them down after the build. Full elastic compute — no idle VMs.
Encrypted secrets vault per project. Inject into any step as environment variables. Never exposed in logs or build output.
/ test intelligence
Smart test execution. Not just a test runner.
Parse, analyze, and optimize your test suite — across frameworks, automatically.
Automatic test discovery for pytest, unittest, JUnit, Jest, and Go — parse test results from any framework without configuration.
The flakiness predictor auto-quarantines high-score tests. They still run but their failure doesn't block the build. Re-graduate once stable.
Code-impact analysis maps which tests are affected by a commit. Run only the relevant subset — 10× faster feedback on small PRs.
Retry only tests predicted to be flaky — not the whole suite. Configurable retry budget per job, per test class, or globally.
Build guard runs pre-checks before starting: repo access, secret availability, agent capacity, disk space. Fail fast with a clear message.
Paginated test report list with per-test failure history, trend charts, and drill-down into individual assertions — going back as far as your data allows.
Every test result is annotated with its root-cause class, flakiness score, OOD flag, and similar-failure count — visible directly in the test list, no click required.
See every matrix cell (env × config) on a single heatmap. Instantly spot which combination is failing without opening individual builds.
All failure embeddings, log signatures, and predictions stored as Parquet. Query your own build history for custom analytics — no vendor lock-in.
Bring your own UI component screenshots. Fine-tune the YOLOv8 visual diff model on your design system so it catches your specific regressions.
/ build agents
Lightweight agents. Every platform.
A single self-updating binary that runs as a background service or interactive app — on Windows, Linux, macOS, or Docker.
The launcher binary wraps the client. When a new version ships to the CDN, the launcher downloads, verifies, swaps, and restarts — with automatic rollback on failure.
Run as a daemon (Windows Service / systemd) for CI servers, or in interactive app mode for developer workstations. Switch at runtime — no reinstall.
Jittered exponential backoff with configurable max retry. Agents survive network hiccups and server restarts without losing their build session.
LLamaSharp GGUF runtime runs local LLM inference on the agent machine. Air-gapped environments get full Investigator Agent capability without internet access.
On Windows, agents run in Session 0 (non-interactive service context) for full isolation from interactive desktop sessions. Correct for production CI.
Remote mouse, keyboard, and screenshot capabilities built into the agent — useful for UI test orchestration without a separate VNC or RDP session.
Agent scans the host for pytest, Node, .NET, Java, Go, Docker — and makes them available to build steps automatically. No PATH fiddling in YAML.
Configurable retention policy per agent. Old build artifacts pruned automatically so agents don't fill their disks between builds.
/ security & enterprise
Self-hosted. Compliant. Yours to control.
Every byte of your test data — logs, traces, failure embeddings — stays on your infrastructure. SOC 2 aligned, GDPR ready, LDAP/SSO supported.
- JWT session tokens (configurable TTL)
- Personal Access Tokens with 15 permission scopes
- LDAP / Active Directory integration
- SSO (SAML 2.0 compatible)
- IP allowlist and rate limiting per endpoint
- Automatic IP ban on repeated auth failures
- RBAC with fine-grained permission matrix
- Project-scoped roles: Viewer, Tester, Developer, Admin
- PAT token scopes: build:read, build:write, agent:register, report:read…
- Resource-level ACLs on jobs, nodes, and secrets
- Audit log for all permission-sensitive actions
- Docker sidecar architecture — each service isolated
- Nuitka-compiled protected server image (no source exposure)
- Multi-replica horizontal scaling with Redis pub/sub
- SOC 2 aligned data handling practices
- GDPR compliant — no external analytics or telemetry
- Air-gap compatible — runs fully offline with local AI models
/ integrations
Plugs into your entire toolchain.
Semantic bug linking — failures matched to existing issues via MiniLM embeddings. Auto-open tickets for novel failures. Bi-directional status sync.
Build status cards with failure summaries and root-cause labels pushed to any Teams channel. Smart deduplication — one notification per unique failure cluster.
Rich Slack blocks with build status, test counts, AI triage labels, and a direct link to the Investigator summary. Configurable per-project channels.
Custom HTTP webhooks on build start, completion, and failure. Full build payload in JSON — integrate with any tool that accepts webhooks.
246+ endpoints covering builds, jobs, agents, test results, AI models, users, and reporting. WebSocket (56 message types) + SSE for real-time streaming.
Import existing Jenkinsfiles. Automated Groovy→YAML conversion so your existing pipeline knowledge isn't wasted.
Gemini, GPT-4, and local Phi-3.5 (via LLamaSharp) for the Investigator Agent. Switch providers in config — no code changes.
Standard JUnit XML output for every test run. Compatible with Jenkins, GitHub Actions summaries, Allure, ReportPortal, and any CI dashboard.
Every action available in the UI is also available via API. Trigger builds, query test history, manage agents, retrieve AI predictions, and pull reports — all from scripts or your own tooling.
# Trigger a build via API
curl -X POST https://your-server/api/v3/builds/ \
-H "Authorization: Bearer <PAT>" \
-H "Content-Type: application/json" \
-d '{"job_id": 42, "branch": "main"}'
# Stream live logs via SSE
curl https://your-server/api/v3/builds/4821/logs/stream/ \
-H "Authorization: Bearer <PAT>"
/ comparison
How Testhide stacks up against alternatives.
| Capability | Testhide | GitHub Actions | Jenkins | Braintrust / Langfuse |
|---|---|---|---|---|
| Self-hosted, data stays on-prem | ✓ | — | ✓ | Partial |
| AI root-cause classification | ✓ (DistilBERT) | — | — | — |
| Flakiness prediction & quarantine | ✓ (41 features) | — | — | — |
| Visual diff / screenshot regression | ✓ (YOLOv8) | — | — | — |
| LLM Investigator Agent | ✓ (14 tools) | — | — | — |
| Jira auto-linking (semantic) | ✓ | Plugin only | Plugin only | — |
| Matrix builds | ✓ | ✓ | Plugin | — |
| TPS parallel sharding | ✓ | — | — | — |
| LDAP / SSO / RBAC | ✓ | Org level only | ✓ | — |
| Air-gapped / offline AI | ✓ (LLamaSharp) | — | — | — |
| vCenter / ESXi VM control | ✓ | — | Plugin | — |
| Free self-hosted tier | ✓ unlimited | ✓ (limited) | ✓ | — |
/ get started
Everything ships in one Docker Compose. Five minutes.
No cloud account. No credit card. Self-host Testhide on any Linux server and connect your first agent in minutes.