序A research lab that ships

The space between
thinking and acting.

Blossom AI is a research lab and a product company. We study how models should reason, route, and collaborate with experts — and we put that work into systems that operate reliably inside operationally complex businesses.

Request access Read the research

思考と行動の間·The space between thinking and acting

研究室Blossom LabsResearch on routing, evals, agents, RL

製品Blossom ProductRouting · Eval · Agents

本拠地Tokyo · San FranciscoFounded in 2025

製品の角度How we build products

Four angles. One platform.

経路Routing as a surface

Every request goes to the right model — with per-tenant policy and drift-aware reroute.

校正Calibrated under load

We grade refusal, deferral, and uncertainty as first-class outcomes — not failures swept under the rug.

評価Governed evaluation

Curated datasets and LLM-as-a-Judge under governance you can audit.

監査Auditable end-to-end

Every decision is policy-versioned, replayable, and inspectable by the team that has to defend it.

二層Two layers, one discipline

A lab and a product.
Each makes the other honest.

Research without deployment becomes performance art. Product without research becomes a wrapper. We hold the two together — what we learn in the lab ships into our products; what we see in production sets the lab’s next question.

研究室LabsResearch

How should AI systems learn, reason, and collaborate with human experts?

Blossom Labs studies the open questions underneath modern AI deployment — scalable knowledge discovery, calibrated reasoning, and reinforcement learning in simulated operations. We publish what we find.

推論Reasoning under distribution shiftHow models stay calibrated when the world moves.
協働Human–expert collaborationWhere to ask, where to defer, where to act.
模擬RL in simulated operationsPractice in environments before production.

製品ProductDeployments

Three systems that put research-grade AI into operationally complex businesses.

Logistics, finance, manufacturing, healthcare ops — domains where a small share of bad decisions costs real money or worse. Operators get routing, evals, and agents that hold up under load.