Blog

Engineering, releases, and practical patterns for production AI agents.

Latest

Why We Built JamJet

Everyone can build an AI agent demo in an afternoon. Getting that same agent to run reliably in production — handling failures, maintaining state across restarts, collaborating with other agents, and producing results you can actually trust — that is a fundamentally different problem. JamJet exists because we got tired of pretending these were the same problem.

This post explains the gap we kept hitting, why the existing tools did not close it, and what we decided to build instead.


The demo-to-production gap

Here is a pattern we have seen dozens of times, and lived through ourselves:

Day 1. You wire up a few LLM calls, chain them together, add a tool or two. The demo works. Everyone is excited. Ship it.

Week 2. The API times out at step 4 of 7. The entire run is lost. You add retry logic. The retry causes a duplicate action because the first call actually succeeded — it just timed out on the response. Now you need idempotency. You write more glue code.

Month 2. You are running agents that take 10-15 minutes. They touch external systems. They make decisions. And when something goes wrong, you have no visibility into what happened, no way to replay the execution, and no way to resume from where it failed. You are debugging by reading logs and guessing.

Month 4. You realize you have spent more time building reliability infrastructure around your agent than building the agent itself. The orchestration framework you picked in week one is now the bottleneck. It does not have durability. It does not have structured state. It was built for prototyping, and you are trying to run it in production.

This is the demo-to-production gap. It is not a skill problem. It is an infrastructure problem.


Why existing tools fall short

We tried everything before building JamJet. Every option had real strengths and a hard ceiling.

Prototyping frameworks

LangChain, CrewAI, LangGraph — these tools are excellent for getting started. They give you abstractions for chains, agents, tools, and multi-agent coordination. You can build a working prototype in a few hours.

But they are Python-native runtimes with no durable execution. If your process crashes, your state is gone. If your agent run takes 20 minutes and fails at minute 18, you start over. There is no event sourcing, no checkpointing, no crash recovery. The concurrency model is Python’s concurrency model, which means GIL contention and limited parallelism.

These are prototyping tools being asked to do production work. That is not a criticism — it is a category observation. They solve the “build a demo” problem well. They do not solve the “run it reliably” problem at all.

Durable workflow engines

Temporal and Durable Functions solve the reliability side. They have event sourcing, replay, crash recovery, distributed scheduling. These are proven systems used in production at serious scale.

But they are not agent-native. They do not understand LLM calls, tool invocations, model routing, or evaluation loops. You end up wrapping every model call in an activity, manually serializing prompts and responses, building your own retry-with-feedback logic, and bolting on observability. You get durability, but you build everything agent-specific from scratch.

It works. But it is like using a general-purpose database as a message queue — technically possible, architecturally awkward, and you spend your time fighting the abstraction instead of building your product.

Vendor SDKs

The OpenAI Agents SDK, Google ADK, and similar tools are well-designed for their respective ecosystems. If you are building exclusively on one provider’s models, they offer tight integration and fast iteration.

The problem is lock-in. Your workflow definition, tool integration layer, and execution model are all coupled to a single vendor. Switching models means rewriting your orchestration. Multi-model workflows — using Claude for reasoning and Gemini for code generation in the same pipeline — become integration projects. And you are dependent on the vendor’s roadmap for features like durability, evaluation, and multi-agent coordination.

We wanted model-agnostic orchestration that treats models as interchangeable resources, not as the foundation of the platform.


What JamJet does differently

JamJet is a durable, graph-based workflow runtime built specifically for AI agents. The core is Rust. The authoring surface is Python (and Java, with Go coming). The key design decisions:

Durability is not optional

Every workflow execution in JamJet is event-sourced. Before each node runs, we write a checkpoint. If the runtime crashes, the machine reboots, or you deploy a new version mid-execution — it resumes from exactly where it stopped.

This is not a feature you enable. It is the execution model. Every state transition, every tool call result, every model response is persisted as an event. You get full auditability and deterministic replay for free.

If the process crashes after step 3 completes but before step 4 starts, the runtime replays the first three results from the event log and continues. No re-execution. No lost work.

Native protocol support

JamJet has first-class support for MCP (Model Context Protocol) and A2A (Agent-to-Agent Protocol). These are not plugins or community adapters — they are built into the runtime.

MCP gives you a standard interface to hundreds of tool servers. Instead of writing custom integrations for every API, you connect to MCP servers that expose tools through a common protocol. One integration pattern, any tool.

A2A gives your agents a standard way to collaborate with agents in other frameworks, other organizations, other runtimes. Your JamJet agent can delegate work to a LangGraph agent, receive tasks from a Google ADK agent, or coordinate with agents you did not build and do not control.

The runtime handles connection management, authentication, retries, and response normalization. You write the workflow logic.

Rust core, polyglot authoring

The runtime — scheduler, state machine, event store, protocol handlers — is Rust + Tokio. No garbage collector, no GIL, real parallelism. The Python SDK compiles your workflow definitions into a Rust IR graph that the scheduler executes natively.

This means you write Python but your workflows run at Rust throughput. State serialization, node dispatch, concurrent fan-out/fan-in, checkpoint writes — all happen in the Rust runtime.

Progressive API

JamJet meets you where you are:

  • @task — a single durable function. Simplest entry point. Your existing Python function, but with checkpointing and crash recovery.
  • Agent — a stateful entity with a model, tools, and an identity. Publishes an Agent Card for discovery via A2A.
  • Workflow — a full DAG of nodes with typed state, conditional routing, fan-out, eval loops, and protocol integrations.

You start simple and add structure as your use case demands it. No framework tax on day one.

Eval as a workflow node

Most evaluation tools are external to the execution pipeline. You run your agent, export the results, run a separate eval script, read the scores, and manually decide what to do.

In JamJet, evaluation is a node type. You put it directly in your workflow graph. It runs during execution, not after. And it can route: retry with feedback if the score is low, branch to a different strategy, or halt execution if quality gates are not met.

This turns evaluation from a post-hoc measurement into a runtime control mechanism. Your agents do not just produce output — they verify it before it leaves the system.


The research angle

Something we did not expect: researchers started using JamJet before production teams did.

The pattern made sense once we saw it. Researchers running multi-agent experiments need exactly the same things production systems need — reproducibility, structured state, strategy comparison, and auditability — but for different reasons. A production team wants crash recovery. A researcher wants deterministic replay so they can compare reasoning strategies across hundreds of runs with controlled variables.

That realization shaped our roadmap significantly. We built ExperimentGrid, which lets you define parameter sweeps across models, strategies, and seeds, then run them as durable workflows with full event traces. Every cell in the grid is a durable workflow execution. If the experiment crashes at run 147 of 270, it resumes from 147. Results include full event traces, so you can inspect exactly what happened in any individual run.

We also added publication export — LaTeX tables, statistical significance tests (Welch’s t-test, Wilcoxon, Mann-Whitney), formatted comparisons — because we realized that researchers were spending hours formatting results that the runtime already had in structured form.

This is not a separate product. It is the same runtime, the same durability model, the same eval nodes. Research and production are not different problems. They are different contexts for the same infrastructure.


What is next

JamJet is open source under Apache 2.0. The Rust runtime, Python SDK, Java SDK, MCP + A2A support, eval harness, and policy engine are all available today.

Here is what we are building next:

  • Community and ecosystem. More example workflows, more integrations, better documentation. We want JamJet to be the place where people share working agent patterns, not just abstractions.
  • Go SDK. Same runtime, same IR, different authoring language. Go’s concurrency model and deployment story make it a natural fit for infrastructure-heavy teams.
  • More reasoning strategies. ReAct, plan-and-execute, critic, reflection, consensus, and debate are built in. We are adding more based on what users are actually building.
  • Enterprise features driven by real demand. Multi-tenant isolation, PII redaction, OAuth delegation, typed failure taxonomy — but only as users need them. We are not building enterprise features speculatively.

The goal is straightforward: close the demo-to-production gap for AI agents. Make durability, evaluation, and protocol support default infrastructure instead of things you build yourself.

If that resonates, try it: pip install jamjet. The quickstart gets you to a running agent in under 10 minutes. And if you have feedback or want to tell us we are wrong about something — open a GitHub issue. We are building in public.


Building a multi-agent wealth advisor with JamJet

Building a multi-agent wealth advisor with JamJet

A wealth management recommendation touches risk modeling, market data, tax law, portfolio construction, and compliance — each requiring different expertise, different tools, and different reasoning. A single LLM prompt cannot do this well.

We built a multi-agent system that mirrors how a real wealth management team operates. Four specialist agents collaborate through a JamJet workflow, each with its own persona, tools, and reasoning strategy. The result is a compliance-checked investment recommendation with a human approval gate before delivery.


The agents

AgentRoleStrategyTools
Risk ProfilerCertified Financial Plannerplan-and-executeget_client_profile, assess_risk_score
Market AnalystCFA charterholderreactget_market_data
Tax StrategistEnrolled Agent (EA)plan-and-executeget_client_profile, analyze_tax_implications
Portfolio ArchitectSenior PM, 20yr expcriticbuild_portfolio_allocation, check_compliance

Each agent gets the reasoning strategy that fits its cognitive task. This is not cosmetic — it changes how the agent thinks.

Wealth management multi-agent architecture — 4 specialist agents orchestrated through a durable workflow with typed state and human approval


Why each strategy matters

Three reasoning strategies — plan-and-execute, react, and critic — each matched to an agent role

Risk Profiler: plan-and-execute

Risk assessment is sequential: retrieve profile, compute concentration risk, calculate risk score, synthesize. Each step depends on the previous one. Plan-and-execute generates a plan upfront and executes it methodically.

risk_profiler = Agent(
    name="risk_profiler",
    model="claude-sonnet-4-6",
    tools=[get_client_profile, assess_risk_score],
    instructions="You are a CFP specializing in risk assessment...",
    strategy="plan-and-execute",
    max_iterations=5,
)

Market Analyst: react

Market analysis is exploratory. The analyst fetches broad indices, notices a trend, drills into specific sectors, then synthesizes. The tight observe-reason-act loop of ReAct is purpose-built for this kind of iterative data exploration.

market_analyst = Agent(
    name="market_analyst",
    model="claude-sonnet-4-6",
    tools=[get_market_data],
    instructions="You are a CFA charterholder...",
    strategy="react",
    max_iterations=5,
)

Tax Strategist: plan-and-execute

Tax rules are systematic. The strategist must evaluate every applicable strategy: tax-loss harvesting, Roth conversions, municipal bonds, asset location, 529 plans. Missing one is worse than exploring creatively.

Portfolio Architect: critic

The final recommendation is the highest-stakes deliverable. The critic strategy drafts an initial allocation, evaluates it against all prior analysis, then refines. This draft-evaluate-revise loop catches gaps that a single pass would miss.

portfolio_architect = Agent(
    name="portfolio_architect",
    model="claude-sonnet-4-6",
    tools=[build_portfolio_allocation, check_compliance],
    instructions="You are a senior portfolio manager...",
    strategy="critic",
    max_iterations=5,
)

Orchestration with typed state

The agents do not call each other. A JamJet Workflow orchestrates them, passing a typed Pydantic model between steps:

workflow = Workflow("wealth_management_advisory", version="0.1.0")

@workflow.state
class AdvisoryState(BaseModel):
    client_id: str
    risk_assessment: str | None = None
    market_analysis: str | None = None
    tax_strategy: str | None = None
    final_recommendation: str | None = None

@workflow.step
async def assess_risk(state: AdvisoryState) -> AdvisoryState:
    result = await risk_profiler.run(
        f"Assess risk for client {state.client_id}"
    )
    return state.model_copy(update={"risk_assessment": result.output})

@workflow.step
async def analyze_markets(state: AdvisoryState) -> AdvisoryState:
    result = await market_analyst.run(
        f"Analyze markets for this risk profile:\n{state.risk_assessment}"
    )
    return state.model_copy(update={"market_analysis": result.output})

@workflow.step
async def plan_tax_strategy(state: AdvisoryState) -> AdvisoryState:
    result = await tax_strategist.run(
        f"Tax strategy for client {state.client_id}..."
    )
    return state.model_copy(update={"tax_strategy": result.output})

@workflow.step(human_approval=True)
async def build_recommendation(state: AdvisoryState) -> AdvisoryState:
    brief = (
        f"RISK: {state.risk_assessment}\n"
        f"MARKET: {state.market_analysis}\n"
        f"TAX: {state.tax_strategy}"
    )
    result = await portfolio_architect.run(
        f"Build portfolio recommendation:\n{brief}"
    )
    return state.model_copy(
        update={"final_recommendation": result.output}
    )

Three design choices worth noting:

Typed state. AdvisoryState is a Pydantic model — IDE autocomplete, compile-time validation, automatic JSON Schema. Not session.state['risk_assessment'] with string keys.

Immutable updates. Each step returns a new state via model_copy(update={...}). The workflow engine records every transition. If the process crashes after the market analysis step, it resumes from the last committed state.

Human approval. human_approval=True on the final step is a first-class workflow primitive. The workflow pauses and waits for an explicit sign-off. In financial services, a licensed advisor must review before anything reaches the client.


Running it

Local, in-process:

python jamjet_impl.py C-1001

Agents run in sequence, tools execute locally, output streams in real time. No runtime server needed.

On the JamJet runtime (durable):

jamjet dev                         # start the Rust runtime
python jamjet_impl.py --runtime    # submit to runtime

The workflow is now durable. Crash after the tax step? The runtime resumes from there on restart. The approval gate pauses execution until:

jamjet approve exec_<id> --decision approved

Comparison with Google ADK

We implemented the same scenario with Google ADK. Here is the side-by-side.

What ADK does well

  • Simpler tool definition — plain Python functions, no decorator needed
  • Built-in parallel executionParallelAgent runs sub-agents concurrently
  • Vertex AI integration — managed hosting if you are on GCP with Gemini

Where JamJet pulls ahead

JamJet vs Google ADK — 9-dimension feature comparison

CapabilityJamJetGoogle ADK
Reasoning strategies3 built-in per agentModel decides
State modelTyped Pydantic, immutableMutable dict
Human approvalhuman_approval=TrueBuild from scratch
DurabilityEvent-sourced, crash-safeIn-memory only
Audit trailFull event logNot available
ProtocolsMCP + A2A + ANPMCP client only
LLM supportAny OpenAI-compatibleGemini-first
Cost controlsmax_cost_usd, max_iterationsNot built-in

The strategy gap

With JamJet, each agent gets a reasoning strategy that matches its task. The risk profiler plans then executes. The market analyst observes then reasons. The portfolio architect drafts, critiques, and refines.

With ADK, you have one lever: the system prompt. The model decides how to reason. You cannot say “use ReAct for this agent” — that concept does not exist in ADK.

The durability gap

JamJet event-sources every state transition. If the process crashes after step 2:

  • JamJet — resumes from last committed state. Steps 1-2 are preserved. Only steps 3-4 re-execute.
  • ADK (OSS) — everything is lost. Start over.

In a workflow costing $0.50+ per run, that adds up.

The compliance gap

Wealth management is heavily regulated — FINRA suitability, Reg BI, KYC/AML. JamJet provides:

  • Immutable audit trail of every agent decision, tool call, and state transition
  • Human approval gate ensuring a licensed advisor signs off before client delivery
  • Compliance tool checking suitability, concentration limits, and regulatory requirements

ADK has no built-in audit trail and no approval primitive. You build both from scratch.


When to choose what

JamJet — durable execution, audit trails, human approval gates, multi-framework interop via MCP/A2A. Financial services, healthcare, legal — anywhere compliance is non-negotiable.

Google ADK — fast prototyping with Gemini, Vertex AI managed hosting, no durability or compliance requirements.


Try it

git clone https://github.com/jamjet-labs/jamjet
cd examples/wealth-management-agents

# JamJet
python jamjet_impl.py

# Google ADK (requires google-adk)
python google_adk_impl.py

The comparison.md in that directory has a full 8-dimension analysis. The tools.py contains all simulated data sources — swap in Bloomberg/Plaid/tax APIs for production.


Phase 4: Enterprise security for production agents

Phase 4: Enterprise security for production agents

The gap between a demo agent and an enterprise agent is not intelligence. It is trust.

You build a research agent that synthesizes reports. It works. Then you try to deploy it at a company with multiple customers, compliance requirements, and a security team. Now you need answers to questions the framework never considered: which customer’s data is this agent touching? Is PII leaking into the audit log? Can this agent access more than the user who triggered it? Can agents from another organization call ours?

Most frameworks punt on these questions. We decided to answer them in the runtime.


What shipped

Phase 4 adds seven enterprise capabilities to the JamJet runtime, all enforced at the Rust layer — not by convention, not by middleware you might forget to add:

  • Multi-tenant state partitioning — row-level isolation by tenant ID across all storage
  • PII redaction engine — regex-based detection with mask, hash, and remove modes
  • Data retention policies — automatic expiry and purge of audit entries
  • Pluggable secret backends — Vault, AWS Secrets Manager, file-based, with priority chaining
  • A2A federation auth — capability-scoped Bearer tokens with agent allowlists
  • mTLS configuration — mutual TLS for cross-organization agent federation
  • OAuth 2.0 delegation — RFC 8693 token exchange with scope narrowing and per-step scoping

Two companion posts go deeper: data governance and PII redaction covers the data story, and OAuth delegation and federation auth covers the security story.

Phase 4 enterprise architecture


Multi-tenant isolation

If you run a SaaS platform, your customers share the same agent definitions but their data must never mix. JamJet now partitions all storage by tenant ID — workflow definitions, execution state, event logs, audit trails, and snapshots.

# Same workflow, different tenants — fully isolated
jamjet run workflow.yaml \
  --input '{"invoice_id": "INV-001", "amount": 2500}' \
  --tenant acme

jamjet run workflow.yaml \
  --input '{"invoice_id": "INV-042", "amount": 75000}' \
  --tenant globex

The runtime’s TenantScopedSqliteBackend wraps every storage query with WHERE tenant_id = ?. The workflow definition table uses a composite primary key (tenant_id, workflow_id, version). Acme cannot see Globex’s data. Not through a query, not through an API call, not through a bug.

Tenant isolation architecture


PII redaction

Your agent processes a customer onboarding form. The form has an email, a Social Security number, a phone number, and a credit card. The agent needs the real data for KYC verification. But the audit log — the thing compliance reviews — should never contain it.

JamJet’s DataPolicyIr lets you declare PII handling at the workflow level:

data_policy:
  pii_detectors: [email, ssn, phone, credit_card]
  pii_fields: ["$.email", "$.ssn", "$.credit_card"]
  redaction_mode: mask        # or: hash, remove
  retain_prompts: false       # strip prompts from audit log
  retain_outputs: false       # strip model outputs from audit log
  retention_days: 90          # auto-purge after 90 days

The runtime’s PiiRedactor compiles regex patterns once at startup and applies them before any state reaches the audit log. Three modes: mask (partial reveal: ***-**-6789), hash (SHA-256 for pseudonymized analytics), and remove (field deletion). Read the full deep dive on data governance.


OAuth 2.0 delegation

When an agent acts on behalf of a user, it should never hold the user’s full credentials. JamJet implements RFC 8693 token exchange: the user’s token goes in, a narrowly-scoped agent token comes out.

oauth:
  token_endpoint: "${JAMJET_OAUTH_TOKEN_ENDPOINT}"
  grant_type: "urn:ietf:params:oauth:grant-type:token-exchange"
  client_id: "${JAMJET_OAUTH_CLIENT_ID}"
  client_secret: "${JAMJET_OAUTH_CLIENT_SECRET}"
  requested_scopes: ["expenses:read", "expenses:write"]

The runtime enforces scope narrowing: the agent’s requested scopes must be a subset of the user’s scopes. If the agent requests admin:all but the user only has expenses:read, the agent gets expenses:read — or nothing, if there is no intersection.

Different workflow steps can declare different scope requirements:

nodes:
  authenticate:
    oauth_scopes:
      required_scopes: ["expenses:read"]

  submit-expense:
    oauth_scopes:
      required_scopes: ["expenses:read", "expenses:write"]

If a token is revoked or expires mid-workflow, the runtime returns a clean OAuthError and escalates to a human. No silent failures. Every token exchange and API call is logged in the OAuthAuditEntry. Read the full deep dive on OAuth and federation.


mTLS and A2A federation

When agents from different organizations need to communicate, transport security and access control both matter. JamJet now supports mutual TLS for A2A federation — both sides present certificates, both sides verify.

On top of mTLS, the FederationPolicy adds capability-scoped Bearer tokens. Each token has a name, an agent ID, and a set of scopes. The federation_auth_layer middleware validates tokens, checks method-level scope requirements, and enforces agent allowlists.

federation:
  require_auth: true
  tokens:
    - token: "tok-alpha"
      name: "Research Agent"
      agent_id: "agent-alpha"
      scopes: ["read", "write"]
  allowed_agents: ["agent-alpha"]
  method_scopes:
    "tasks/send": ["write"]
    "tasks/get": ["read"]

Try it

All of these features are live in the JamJet runtime. The examples repository has runnable examples for each:

git clone https://github.com/jamjet-labs/examples
cd examples/multi-tenant       # tenant isolation
cd examples/data-governance    # PII redaction + retention
cd examples/oauth-delegation   # OAuth 2.0 + scope narrowing

Each example includes both YAML and Python versions, plus Java SDK equivalents (java-multi-tenant, java-data-governance, java-oauth-agent).

Install or upgrade:

pip install --upgrade jamjet

Questions or feedback — open a GitHub Discussion. We are building in public and these features came directly from conversations with teams trying to put agents into production.


The JamJet team