A research lab that ships

The space between
thinking and acting.

Blossom AI is a research lab and a product company. We study how models should reason, route, and collaborate with experts — and we put that work into systems that operate reliably inside operationally complex businesses.

思考と行動の·The space between thinking and acting
研究室Blossom LabsResearch on routing, evals, agents, RL
製品Blossom ProductRouting · Eval · Agents
本拠地Tokyo · San FranciscoFounded in 2025
製品の角度How we build products

Four angles. One platform.

01
経路Routing as a surface

Every request goes to the right model — with per-tenant policy and drift-aware reroute.

02
校正Calibrated under load

We grade refusal, deferral, and uncertainty as first-class outcomes — not failures swept under the rug.

03
評価Governed evaluation

Curated datasets and LLM-as-a-Judge under governance you can audit.

04
監査Auditable end-to-end

Every decision is policy-versioned, replayable, and inspectable by the team that has to defend it.

二層Two layers, one discipline

A lab and a product.
Each makes the other honest.

Research without deployment becomes performance art. Product without research becomes a wrapper. We hold the two together — what we learn in the lab ships into our products; what we see in production sets the lab’s next question.

研究室LabsResearch

How should AI systems learn, reason, and collaborate with human experts?

Blossom Labs studies the open questions underneath modern AI deployment — scalable knowledge discovery, calibrated reasoning, and reinforcement learning in simulated operations. We publish what we find.

  • 推論Reasoning under distribution shiftHow models stay calibrated when the world moves.
  • 協働Human–expert collaborationWhere to ask, where to defer, where to act.
  • 模擬RL in simulated operationsPractice in environments before production.
製品ProductDeployments

Three systems that put research-grade AI into operationally complex businesses.

Logistics, finance, manufacturing, healthcare ops — domains where a small share of bad decisions costs real money or worse. Operators get routing, evals, and agents that hold up under load.

  • 経路RoutingEach request to the right model — cost, latency, policy.
  • 評価EvaluationProduction evals on real traffic; regression attribution.
  • 行動AgentsLong-horizon work with human approvals where they matter.
製品The product

Three systems that work together.

All product docs
経路Routing

Send every request to the model that gets it right at the lowest cost — with per-tenant policy and drift detection.

  • 14 models, one policy
  • Per-tenant cost caps
  • Drift-aware reroute
Learn more
評価Evaluation

Production evals on real traffic distributions. Catch regressions before they reach users; attribute them to the change that caused them.

  • Live traffic replay
  • Attribution-first
  • Regression budgets
Learn more
行動Agents

Long-horizon agents for operationally complex work. Human approvals where they matter, autonomy where they don't — with auditable traces.

  • Approval policies
  • Tool sandboxing
  • Replayable traces
Learn more
流れHow it flows

From research question
to live operations.

We don’t separate “ideation” from “delivery.” Every loop closes back into the lab.

01問い

Question

A research question rooted in something we saw in real production.

02実験

Experiment

Simulations, evaluations, ablations. We publish what we find.

03橋渡し

Crossing

Findings become operating principles — codified, peer-reviewed, versioned.

04実装

Implementation

Principles ship into Routing, Eval, or Agents — gated by an internal eval bar.

05運用

Operations

Production telemetry feeds back; the lab's next question begins.

研究室より

The questions worth answering only show up under load — and the products worth shipping demand the rigor of a lab.

From the lab — Apr 2026
研究室の現況

What the lab is working on now.

  • Routing policies under multi-tenant drift
  • RL agents in simulated logistics environments
  • Expert-in-loop training for Operations
  • Calibrated refusal for long-horizon work
Visit the lab