/ features

Not just CI/CD. AI that diagnoses your failures.

Every other CI tool tells you a test failed. Testhide tells you why — and opens the Jira ticket automatically.

8 AI/ML models
246+ REST API endpoints
5 platforms supported
50+ build & pipeline features

/ ai intelligence

8 trained models, zero cloud required.

All AI runs on your infrastructure. No data leaves your server. Every model was purpose-built for test failure analysis — not a general-purpose LLM wrapper.

DistilBERT
🎯
Root-Cause Classifier

Classifies every test failure as ENV_FAILURE, PRODUCT_BUG, or TEST_ISSUE with a confidence score. Stop debating what broke — know instantly.

NLP Real-time
HistGradient
🎲
Flakiness Predictor

Gradient-boosted model with 41 engineered features scores the probability a test is flaky. Quarantine bad tests automatically. Stop rerunning false alarms.

ML 41 features
FAISS + MLP
🔍
Failure Retriever

Contrastive MLP embeds failures, FAISS indexes them. When something breaks, instantly surface the 5 most similar past failures and their resolutions.

Vector search Semantic
Autoencoder + IF
🚨
Out-of-Distribution Detector

Autoencoder + IsolationForest detects when a failure pattern is genuinely novel — not a known recurring issue. Escalate the right things.

Anomaly detection
YOLOv8 + SSIM
🖼
Visual Diff Analyzer

YOLOv8 detects UI elements, SSIM measures structural similarity. Catch pixel regressions, layout shifts, and missing components automatically across every UI test.

Computer vision Screenshot diff
Drain3
📋
Log Signature Miner

Drain3 log parsing extracts structured templates from unstructured log streams. Groups similar error patterns across thousands of build logs into actionable signatures.

Log analysis Pattern mining
MiniLM + FAISS
🐛
Bug Linker

MiniLM embeds failure messages and searches Jira using embedding similarity. Links test failures to existing tickets automatically — no manual triage required.

Semantic search Jira
Clustering
📈
Emerging Issues Detector

Continuously clusters failure signatures over time. Surfaces new patterns before they become P1 incidents — catch the trend when it's 3 failures, not 300.

Proactive Trend detection

/ investigator agent

An LLM that debugs your CI failures for you.

The Investigator is a ReAct-loop agent with 14 specialized tools. Give it a failed build and it autonomously queries logs, diffs, test history, and Jira — then writes a root-cause summary.

  • 14 investigation tools: log fetch, diff compare, similar-failures search, Jira lookup
  • Pluggable LLM backend: cloud (Gemini, GPT-4) or local (Phi-3.5 via LLamaSharp GGUF)
  • Smart Context Builder prunes token budget — no truncated traces
  • Edge AI mode: runs entirely on-device, fully air-gapped
  • Code-Impact Analyzer links failures to the exact commit that caused them
Investigator Agent · build #4821 ReAct
fetch_logs(build=4821) → 3 942 lines
search_similar_failures(top_k=5) → 3 matches
get_jira_tickets(query="DB connection") → TH-2847
code_impact_analysis(commit=a3f92c1) → 2 files
generate_summary
Root cause
DB connection pool exhausted after db_utils.py change in commit a3f92c1. Matches TH-2847 (ENV_FAILURE, 97% confidence). Fix: increase pool size or revert connection timeout patch.

/ build platform

A real CI/CD engine. Not a GitHub Actions wrapper.

Testhide is a self-hosted build platform with its own scheduler, agent pool, and pipeline runtime. You own the infrastructure — and the queue.

📋
8 step types

Script, Docker, dotnet test, pytest, unittest, Jest, JUnit, and custom — all natively parsed. Define every step in YAML or the visual job editor.

🔁
Matrix builds

Fan out the same job across Python versions, OS targets, or LLM models. Results aggregated into a single matrix timeline heatmap.

TPS parallel sharding

Tests-per-second aware sharding splits your suite across N agents so every shard finishes in the same wall-clock time. No more one slow shard blocking everything.

🕐
Cron scheduling

Schedule builds with full cron syntax. Nightly regression suites, weekly full-stack runs, pre-release sweeps — all managed in the platform.

🏷
Label & pool dispatch

Tag agents with labels (gpu, windows, staging). Route jobs to the right agent pool. "Fastest available node" dispatch keeps queues drained.

📡
Live log streaming

Server-sent events (SSE) push ANSI-rendered log lines to the browser in real time. Full-text search and collapsible step groups included.

📦
Build artifacts

Publish and download artifacts per build. Attach JUnit XML, coverage reports, screenshots, or binaries — retained according to your retention policy.

📥
Jenkins import

Import existing Jenkinsfile / Groovy pipelines. Automated conversion maps stages and steps to Testhide's job format — no manual rewrite.

🔮
ETA prediction

ML-backed build duration prediction shown while the build is still running. Know when it finishes before it does.

🔄
Disconnect recovery

Agent disconnects mid-build? The scheduler automatically reassigns the job to another available node without losing progress already streamed.

🖥
vCenter / ESXi control

Spin up VM agents on VMware vCenter or ESXi. Tear them down after the build. Full elastic compute — no idle VMs.

🔐
Secrets management

Encrypted secrets vault per project. Inject into any step as environment variables. Never exposed in logs or build output.

/ test intelligence

Smart test execution. Not just a test runner.

Parse, analyze, and optimize your test suite — across frameworks, automatically.

🔬
Multi-framework discovery

Automatic test discovery for pytest, unittest, JUnit, Jest, and Go — parse test results from any framework without configuration.

🚫
Flaky test quarantine

The flakiness predictor auto-quarantines high-score tests. They still run but their failure doesn't block the build. Re-graduate once stable.

🎯
Smart test selection

Code-impact analysis maps which tests are affected by a commit. Run only the relevant subset — 10× faster feedback on small PRs.

🔁
Intelligent flaky retry

Retry only tests predicted to be flaky — not the whole suite. Configurable retry budget per job, per test class, or globally.

🏗
Build pre-flight checks

Build guard runs pre-checks before starting: repo access, secret availability, agent capacity, disk space. Fail fast with a clear message.

📊
Test reports & history

Paginated test report list with per-test failure history, trend charts, and drill-down into individual assertions — going back as far as your data allows.

AI triage badges

Every test result is annotated with its root-cause class, flakiness score, OOD flag, and similar-failure count — visible directly in the test list, no click required.

Matrix timeline heatmap

See every matrix cell (env × config) on a single heatmap. Instantly spot which combination is failing without opening individual builds.

Dataset warehouse

All failure embeddings, log signatures, and predictions stored as Parquet. Query your own build history for custom analytics — no vendor lock-in.

YOLO fine-tuning

Bring your own UI component screenshots. Fine-tune the YOLOv8 visual diff model on your design system so it catches your specific regressions.

/ build agents

Lightweight agents. Every platform.

A single self-updating binary that runs as a background service or interactive app — on Windows, Linux, macOS, or Docker.

🪟
Windows
x64
🐧
Linux
x64 · deb · rpm
🍎
macOS
x64 · arm64
🐳
Docker
Compose · K8s
Cloud VMs
ESXi · vCenter
🔄
Automatic self-update

The launcher binary wraps the client. When a new version ships to the CDN, the launcher downloads, verifies, swaps, and restarts — with automatic rollback on failure.

Service & app modes

Run as a daemon (Windows Service / systemd) for CI servers, or in interactive app mode for developer workstations. Switch at runtime — no reinstall.

📡
Resilient WebSocket

Jittered exponential backoff with configurable max retry. Agents survive network hiccups and server restarts without losing their build session.

🧠
On-device AI inference

LLamaSharp GGUF runtime runs local LLM inference on the agent machine. Air-gapped environments get full Investigator Agent capability without internet access.

🔒
Session isolation

On Windows, agents run in Session 0 (non-interactive service context) for full isolation from interactive desktop sessions. Correct for production CI.

🖱
Remote control

Remote mouse, keyboard, and screenshot capabilities built into the agent — useful for UI test orchestration without a separate VNC or RDP session.

🔎
Auto tool detection

Agent scans the host for pytest, Node, .NET, Java, Go, Docker — and makes them available to build steps automatically. No PATH fiddling in YAML.

🧹
Artifact cleanup

Configurable retention policy per agent. Old build artifacts pruned automatically so agents don't fill their disks between builds.

/ security & enterprise

Self-hosted. Compliant. Yours to control.

Every byte of your test data — logs, traces, failure embeddings — stays on your infrastructure. SOC 2 aligned, GDPR ready, LDAP/SSO supported.

🔐
Authentication
  • JWT session tokens (configurable TTL)
  • Personal Access Tokens with 15 permission scopes
  • LDAP / Active Directory integration
  • SSO (SAML 2.0 compatible)
  • IP allowlist and rate limiting per endpoint
  • Automatic IP ban on repeated auth failures
🛡
Authorization
  • RBAC with fine-grained permission matrix
  • Project-scoped roles: Viewer, Tester, Developer, Admin
  • PAT token scopes: build:read, build:write, agent:register, report:read…
  • Resource-level ACLs on jobs, nodes, and secrets
  • Audit log for all permission-sensitive actions
🏢
Infrastructure
  • Docker sidecar architecture — each service isolated
  • Nuitka-compiled protected server image (no source exposure)
  • Multi-replica horizontal scaling with Redis pub/sub
  • SOC 2 aligned data handling practices
  • GDPR compliant — no external analytics or telemetry
  • Air-gap compatible — runs fully offline with local AI models

/ integrations

Plugs into your entire toolchain.

🐛
Jira

Semantic bug linking — failures matched to existing issues via MiniLM embeddings. Auto-open tickets for novel failures. Bi-directional status sync.

💬
Microsoft Teams

Build status cards with failure summaries and root-cause labels pushed to any Teams channel. Smart deduplication — one notification per unique failure cluster.

Slack

Rich Slack blocks with build status, test counts, AI triage labels, and a direct link to the Investigator summary. Configurable per-project channels.

🔗
Webhooks

Custom HTTP webhooks on build start, completion, and failure. Full build payload in JSON — integrate with any tool that accepts webhooks.

📡
REST API v3

246+ endpoints covering builds, jobs, agents, test results, AI models, users, and reporting. WebSocket (56 message types) + SSE for real-time streaming.

🖥
Jenkins

Import existing Jenkinsfiles. Automated Groovy→YAML conversion so your existing pipeline knowledge isn't wasted.

🤖
LLM Providers

Gemini, GPT-4, and local Phi-3.5 (via LLamaSharp) for the Investigator Agent. Switch providers in config — no code changes.

📊
JUnit / xUnit Output

Standard JUnit XML output for every test run. Compatible with Jenkins, GitHub Actions summaries, Allure, ReportPortal, and any CI dashboard.

Full programmatic control via REST API v3

Every action available in the UI is also available via API. Trigger builds, query test history, manage agents, retrieve AI predictions, and pull reports — all from scripts or your own tooling.

246+ endpoints WebSocket SSE streaming OpenAPI schema
# Trigger a build via API
curl -X POST https://your-server/api/v3/builds/ \
  -H "Authorization: Bearer <PAT>" \
  -H "Content-Type: application/json" \
  -d '{"job_id": 42, "branch": "main"}'

# Stream live logs via SSE
curl https://your-server/api/v3/builds/4821/logs/stream/ \
  -H "Authorization: Bearer <PAT>"

/ comparison

How Testhide stacks up against alternatives.

Capability Testhide GitHub Actions Jenkins Braintrust / Langfuse
Self-hosted, data stays on-prem Partial
AI root-cause classification ✓ (DistilBERT)
Flakiness prediction & quarantine ✓ (41 features)
Visual diff / screenshot regression ✓ (YOLOv8)
LLM Investigator Agent ✓ (14 tools)
Jira auto-linking (semantic) Plugin only Plugin only
Matrix builds Plugin
TPS parallel sharding
LDAP / SSO / RBAC Org level only
Air-gapped / offline AI ✓ (LLamaSharp)
vCenter / ESXi VM control Plugin
Free self-hosted tier ✓ unlimited ✓ (limited)

/ get started

Everything ships in one Docker Compose. Five minutes.

No cloud account. No credit card. Self-host Testhide on any Linux server and connect your first agent in minutes.