The agent cost problem

Most agents use AI where software is enough.

Modern agents are powerful, but their default loop is expensive: reason, call a tool, reason again, retry, summarize, validate, and repeat. Many of those steps can be compiled into deterministic work that is cheaper, faster, and easier to audit.

Runaway inference cost

Agents resend context, tool schemas, logs, and repeated summaries across many model calls. The cost grows with every loop.

Unfocused execution

Broad tool access and loose prompts make agents wander through files, APIs, retries, and irrelevant context.

Weak production control

Without deterministic gates, side-effect policies, and replayable traces, teams cannot safely scale agentic workflows.

The compiler loop

Observe the agent. Compile the waste out.

The system works like profile-guided optimization for software. First it scans the original agent locally, redacts sensitive material, and creates a diagnostic bundle. The cloud compiler then builds a process graph, identifies waste, emits a hybrid pipeline, and sends it back for local verification against the baseline.

Live trace graph

Original agent starts

prompt, tools, task input

$0.018

File search loop

7 model calls, repeated context

$0.142

Test command

shell span, stderr artifact

$0.000

Log reasoning

31k tokens sent for failure parsing

$0.211

Compiled validation gate

schema, tests, policy checks

$0.004

run total $0.375

Listen

leangetic start installs a zero-touch listener - no code change. Your agent runs exactly as before while the engine records each model step locally (hashes only, in SHADOW: nothing is altered).

Profile (free)

leangetic profile shows, for free, exactly where your agent spends time and money: cost and latency per step, which calls repeat, and which steps fail or retry over and over. You see the waste before you spend a credit.

Optimize on our servers

leangetic optimize sends a redacted bundle of evidence to Leangetic (you approve exactly what leaves your machine). Our servers do the heavy lifting: map each step, synthesize deterministic code for the safe ones, tune the rest (caching, compaction, routing), and assemble a new hybrid agent. This step uses credits.

Judge on your traffic

leangetic judge runs new-vs-old on your real calls. A code step is kept only after it reproduces the model's output with zero mismatches; an LLM judge approves the hybrid only when it's cheaper with equal-or-better quality and zero regressions.

Switch over - or roll back

leangetic promote switches to the proven hybrid in one command (your AI can do it for you). Every served step still falls back to the model on any doubt, and leangetic rollback reverts instantly. Your original is never modified.

$0 Cost of a model step we replace with proven-equivalent code

≥ original Quality required on your own tasks before any switch-over (LLM-judged)

0 Regressions allowed - a step that ever diverges stays AI

1 command To switch over - and to roll back instantly

Production-candidate artifact (verify before promotion)

A deployable runtime, not just advice.

The paid output is a versioned hybrid workflow with code nodes, retrieval nodes, validation nodes, focused LLM calls, human gates, and fallback. It is designed to run inside the customer's workspace first, then graduate to private cloud or managed deployment when trust is earned.

pipeline.yaml validated candidate

pipeline:
  id: coding_agent_compiled
  source_agent: original_agent_001

policies:
  max_cost_usd: 0.50
  external_mutations: approval_required
  fallback: original_agent

nodes:
  - id: collect_context
    type: code
    implementation: repo_context_selector

  - id: classify_task
    type: llm
    model_policy: cheap
    tools: []

  - id: parse_test_log
    type: code
    implementation: failure_block_parser

  - id: generate_patch
    type: llm
    model_policy: strong
    tools: [read_file, write_patch]

  - id: validate
    type: validation
    validators: [schema, tests, policy]

fallback:
  trigger:
    - validation_failed_after_repair
    - confidence_below_threshold

Credit-priced compile

Free local scan, explicit estimate, reserved credits, and charged compile.

Versioned pipeline IR

Diffable workflow spec with explicit inputs, outputs, and policies.

Runtime package

Pipeline, runtime config, apply plan, verification plan, and rollback guidance.

Validation gates

Promotion rules that block unsafe or lower-quality candidates.

Runtime monitoring

Cost drift, quality drift, fallback rate, and cache hit rate.

Enterprise posture

Built for agents that touch real systems.

The platform assumes customer data, secrets, internal APIs, and side effects are sensitive from day one. The default alpha flow keeps scans, credentials, real data, verification, promotion, and rollback on the customer side while the proprietary compiler runs in our cloud.

Transparent local collector

Inspect every field before upload; the local tool collects evidence, not the recipe.

Secrets isolation

Keys, databases, CRM access, and raw private data stay out of the cloud compiler.

Credits and limits

Every paid compile has an estimate, max-credit guard, reservation, and ledger entry.

Audit trail

Track account sessions, bundle fingerprints, compile jobs, artifacts, and credit spend.

Framework-neutral

Works below your framework, not beside it.

No SDK to adopt. Leangetic works at two layers under whatever you already run: the CLI layer (any coding agent calls it) and the model-call layer (an SDK patch, a reverse proxy, or a CLI shim). Any framework on OpenAI, Anthropic, Gemini, or an OpenAI-compatible endpoint is measured and optimized with no integration code.

Providers we intercept - SDK patch (Python) or the local proxy (any language)

OpenAI

Anthropic

Gemini

OpenAI-compatible

Ollama

Groq / Together / vLLM

Frameworks, through the model-call layer, no adapter

OpenAI Agents

LangGraph

CrewAI

AutoGen

DSPy

Custom Agents

Telemetry, orchestration and protocols

OpenTelemetry

Langfuse

Phoenix

Temporal

MCP

Get started

Install once. Your AI can drive the rest.

Install the CLI (or point your own AI assistant at the MCP server), start listening on one agent, then let our servers optimize it, judge it on your traffic, and switch over - one command each, with instant rollback.

Install (pick one)

curl

curl -fsSL https://leangetic.com/downloads/leangetic.py \
  -o ~/.local/bin/leangetic && chmod +x ~/.local/bin/leangetic

npm

npx @leangetic-ai/cli --help
# or: npm install -g @leangetic-ai/cli

Then, on one agent

the flow

leangetic start ./your-agent  # listen (nothing changes)
# run your agent as usual, then:
leangetic profile   # free: where you waste time/money + what fails
leangetic optimize ./your-agent  # we map + build (credits)
leangetic judge    # prove it on your traffic
leangetic promote  # switch over (rollback to revert)

Curated alpha - bring one expensive agent.

Everything runs locally and your code never leaves your machine. The hybrid is switched on only after the judge proves it's cheaper with equal-or-better quality on your own tasks.

Apply for the alpha