I have a homelab service that classifies incoming text using an LLM. I wanted different models in different environments: Apple Intelligence on my Mac (free, private), a quantised DeepSeek-R1 on Ollama on my home server, and AWS Bedrock on my cloud VPS. Each has a different SDK, a different async interface, and different configuration.
I did not want to write if backend == "anthropic" branches inside application code. So I extracted the problem into a small package: llm-relay.
The package is small by design. It is not a framework. It does not manage conversation history, streaming, or embeddings. It just routes a prompt to a model and returns a string — and that constraint is what keeps it useful across multiple unrelated services.
What it does
llm-relay exposes two functions:
from llm_relay import ask, ask_json
text = await ask("summarise this", system="You are a summariser")
data = await ask_json("classify: buy milk")
# → {"category": "shopping", "title": "Buy milk", ...}
The backend is controlled entirely by environment variables:
LLM_BACKEND=ollama # or anthropic, bedrock, apple
LLM_MODEL=deepseek-r1:1.5b
Application code never changes. Only the deployment environment does.
How it works
It is mostly LiteLLM with Apple Intelligence support bolted on. For Ollama, Anthropic, and Bedrock it delegates to LiteLLM, which handles the protocol differences between providers. For Apple Intelligence it calls apple_fm_sdk directly, since the on-device Foundation Models framework has its own async interface that LiteLLM does not cover (see Apple Intelligence on the command line).
async def ask(prompt, *, system="", config=None):
cfg = config or LLMConfig.from_env()
if cfg.backend == "apple":
return await _ask_apple(prompt, system=system, cfg=cfg)
return await _ask_litellm(prompt, system=system, cfg=cfg)
ask_json is a thin wrapper that also strips markdown fences, since models habitually wrap JSON in ` ```json ` blocks even when asked not to.
How it is installed
A plain Python package published to GitHub, installed as a git dependency:
# pyproject.toml
dependencies = [
"llm-relay @ git+https://github.com/srih4ri/[email protected]",
]