The article includes visualizations of how parameters from the AIView, AIHost, and per-provider LLM plugins combine at runtime. Where the exact ordering lives in the MedusaLLM / LLMRouter package (not in this repo) it is marked inferred; the gate inequalities and trigger formulas are taken verbatim from the AIHost appsettings annotations.

Token-budget Pipeline (limits across stages)

How one turn’s tokens flow through admission → sizing → compression → routing → generation, and which knob acts at each stage.


  • MaxTokens in the gate and the output cap is the same per-request output budget (Aife:Llm:MaxOutputTokens, or the router’s 4096 default when 0).
  • Compression runs before routing, so it lowers EstimatedInputTokens and can make a smaller-window model eligible.

Model selection / routing (Auto vs Manual)

How a model is chosen and how the ModelCapabilities fields filter candidates.


The filter criteria (window, complexity, modality, use-case) come from the documented ModelCapabilities fields and the Llm:Routing gate; the precise rank ordering (cost vs Priority) is owned by the LLMRouter package and is marked inferred. Manual still passes through generation; whether a manually pinned model bypasses the window gate is router-defined (not asserted here).


Credential & Access-Gate Flow 

(<LLMApiKeys> ↔︎ AcceptedCredentialSources)


MCP tool availability (compounding gates)

“Will an MCP-discovered tool actually reach the model?”


Circuit-breaker States (Llm:CircuitBreaker)

Process-local breaker keyed by (provider, model). When Enabled=false the breaker is bypassed entirely.