製品The Blossom platform

Furiwake, Eval,
and Agents.

Three systems that work together to put research-grade AI inside operationally complex businesses. Built on a single substrate, deployed as you need them, and informed every quarter by what the lab is finding.

経路 · 評価 · 行動Routing, evaluation, and action — one platform.
— Product principle
経路

Furiwake

Routing · GA
Median added latency
28 ms p50
Cost-optimal selection
78% on production traffic
Routing dimensions
Cost · latency · quality · privacy

Furiwake is the routing-and-evaluation primitive for agent labor. It classifies every agent task — coding, research, customer-facing work, operations, analysis — and dispatches it to the model that gets the job right at the lowest cost. Routing decisions account for per-tenant policy, cost caps, latency budgets, and live signals from your evaluation set.

Our distinguishing claim isn't speed — it's that the policy stays calibrated as your traffic moves. Drift detection catches silent quality regressions before users do, and the fallback ladder is something your security team can actually audit.

Request access →

Per-tenant policy

Each customer routes against their own quality and cost surface — not yours.

Drift-aware reroute

Live detection moves traffic before regressions reach users.

Auditable decisions

Every routing decision is logged with the policy version that produced it.

Cost ceilings

Hard caps per tenant, per workload, per day. Predictable spend.

評価

Blossom Eval

Evaluation · GA
Eval traffic
Real production replay
Attribution surface
Prompt · model · routing change
Refresh cadence
Continuous
Regressions caught Q1
0 silent · 12 flagged pre-release

Evaluation built around the principle that synthetic benchmarks lie about what you actually ship. Eval runs on real production traffic distributions, attributes regressions to the change that caused them, and gives you a regression budget instead of a green/red light.

Built for teams who want to release weekly without becoming superstitious about which prompt changes are safe.

Request access →

Live traffic replay

Evals run against representative slices of your real workload.

Attribution-first

When something regresses, you learn which change owns it.

Regression budgets

Ship if you stay inside the budget. Block if you don't.

Calibrated refusal

Refusal and deferral graded as outcomes, not failures.

行動

Blossom Agents

Long-horizon work · Beta
Approval policies
Per-step · per-tool · per-cost
Tool sandboxing
Container-isolated; no shared state
Replayable traces
Every decision, every input
Domains
Logistics · finance ops · clinical ops

Long-horizon agents for operationally complex work. Human approvals where they matter — high cost, high risk, irreversible — and autonomy everywhere they don't. Every decision leaves a trace you can replay, audit, and roll back.

Designed for the work where the model running unattended is exactly what you want, and also exactly what you fear. We make the second part tractable.

Request access →

Approval policies

Configurable thresholds for when humans must sign off.

Tool sandboxing

Each agent action runs in an isolated, observable environment.

Replayable traces

Investigate any decision after the fact in complete fidelity.

Cost-aware planning

Agents budget the work before doing it.

統合One platform

Independent products,
shared substrate.

You can adopt Furiwake, Eval, or Agents on its own — many of our customers do. But each system is built on the same evaluation, observability, and policy substrate. Adopt the second one and the first becomes more useful; adopt the third and you have a single audit surface across every model decision your business makes.

The platform is research-informed but not research-locked. We ship improvements that have cleared an internal evaluation bar — and we publish the studies so you can check our work.

See the research behind it →