← Back to blog

BYOK AI Orchestration: Why Using Your Own API Keys Matters

2026-03-25 • 8 min read • orchex team

The Hidden Cost of AI Tooling

Most AI development tools that call LLM APIs on your behalf do not charge you only for their service. They also mark up the underlying API costs. Some are transparent about it. Many are not.

Here is how it typically works: you pay the tool $20/month for access. The tool calls the OpenAI API on your behalf. OpenAI charges the tool $0.03 per 1K tokens. The tool charges you $0.05 per 1K tokens. That 67% markup is where the margin lives.

For light usage, the markup is negligible. For heavy usage -- like running parallel AI agents across a codebase -- it compounds fast. A 10-stream orchestration that costs $0.80 in raw API calls might cost $1.50 through a proxied service. Run that 20 times a day and you are paying an extra $14/day for the privilege of someone else holding your API key.

What BYOK Means

BYOK stands for Bring Your Own Keys. It is a simple idea: the orchestration tool runs on your machine, calls LLM APIs directly using your API keys, and never touches your tokens or your data.

With BYOK, your costs are exactly what the LLM provider charges. No markup. No middleman. No data leaving your machine to pass through a third-party proxy.

orchex is built entirely on this model. It runs as a local process via MCP (Model Context Protocol), reads your API keys from environment variables, and makes direct calls to provider APIs. The orchestrator itself is free and open-source.

Your MCP configuration looks like this:

{
  "mcpServers": {
    "orchex": {
      "command": "npx",
      "args": ["-y", "@wundam/orchex@latest"]
    }
  }
}

API keys are set as environment variables in your shell profile:

export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="AI..."

orchex inherits these from your shell environment. No configuration file stores your keys. No network request sends them to orchex servers.

The Cost Math

Let us compare the economics for a realistic workload: a development team running 50 multi-agent orchestrations per day, averaging 8 streams per orchestration.

Proxied Service Model

Typical markup: 40-80% on API costs.

Component Cost
Raw API cost per orchestration $0.60
Markup (60%) $0.36
Platform fee $49/mo
Daily cost (50 runs) $48 + $1.63/day platform
Monthly cost $1,440 + $49 = $1,489

BYOK Model

No markup. You pay the LLM provider directly.

Component Cost
Raw API cost per orchestration $0.60
Markup $0.00
orchex (local, free tier) $0.00
Daily cost (50 runs) $30
Monthly cost $900

That is a $589/month difference for a single developer. For a team of five, the savings approach $3,000/month.

The gap widens with heavier usage. BYOK scales linearly with API consumption. Proxied services scale linearly plus the markup percentage plus the platform fee.

Multi-Provider Flexibility

BYOK is not just about cost. It is about flexibility.

When you own your API keys, you choose which provider handles each task. orchex supports six providers: Claude, OpenAI, Gemini, DeepSeek, Ollama, and AWS Bedrock. Different streams in the same orchestration can use different models.

Why would you want this? Because models have different strengths:

  • Claude excels at understanding large codebases and following complex instructions
  • GPT-4o is fast and strong at structured output generation
  • Gemini handles very long contexts well with its large context window
  • DeepSeek offers competitive quality at significantly lower cost
  • Ollama runs locally for zero-cost iteration on smaller tasks
  • AWS Bedrock provides enterprise compliance and VPC-level isolation

In a single orchestration, you might use Claude for the architecture stream (complex reasoning), DeepSeek for the boilerplate streams (cost-effective), and Ollama for the test generation stream (free, runs locally).

This kind of provider mixing is impossible with proxied services that lock you into a single provider.

Data Privacy

BYOK has a privacy dimension that is easy to overlook.

When a proxied service calls an LLM on your behalf, your code passes through their servers. Even if they promise not to log it, your source code leaves your network and enters theirs. For many teams, especially those working on proprietary software, this is a non-starter.

With BYOK orchestration, your code goes directly from your machine to the LLM provider. You have a direct relationship with that provider and their data handling policies. No intermediary stores, logs, or processes your code.

For teams with strict compliance requirements, this distinction matters. Your security team can audit the data flow: local machine to LLM API, nothing in between.

For even more control, you can use Ollama as your provider. With Ollama, nothing leaves your machine at all. The model runs locally, the orchestrator runs locally, and your code never touches the network. This is the maximum privacy configuration, useful for sensitive codebases where even sending code to OpenAI or Anthropic is unacceptable.

The Provider Lock-In Problem

There is a subtler issue with proxied services: provider lock-in.

If your tool only supports OpenAI, you are stuck with OpenAI pricing, OpenAI rate limits, and OpenAI outages. When OpenAI has a bad day (and every provider has bad days), your development workflow stops.

BYOK with multi-provider support gives you redundancy. If your primary provider is rate-limited or down, you switch to another. Your API keys for multiple providers sit in your environment variables, ready to use.

orchex makes provider selection a per-stream decision. If Anthropic's API is slow today, you set your streams to use OpenAI. Tomorrow, you switch back. No migration, no configuration changes beyond an environment variable.

When BYOK Is Not the Right Choice

BYOK is not universally better. There are cases where a managed service makes sense:

Small teams without DevOps -- Managing API keys, monitoring usage across providers, and handling rate limits requires some operational overhead. A managed service handles this for you.

Usage-based budgeting -- Some teams prefer a single predictable bill rather than separate charges from multiple LLM providers. Consolidated billing is a real convenience.

Enterprise procurement -- Large organizations sometimes prefer vendor relationships with a single AI tool provider rather than managing direct API agreements with multiple LLM companies.

For individual developers and small-to-medium teams that are comfortable managing their own API keys, BYOK is almost always the better economic choice. The savings compound over time, and the flexibility to switch providers keeps you from being locked into any single vendor's pricing or capabilities.

Getting Started with BYOK

Set up your API keys as environment variables:

# Add to your ~/.zshrc or ~/.bashrc
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

Install orchex in your IDE via MCP configuration, and start orchestrating. Every API call goes directly to the provider. Every dollar you spend goes to the model that generated your code, not to a middleman.

Check the orchex provider documentation for setup guides for all six supported providers, including Ollama for fully local execution.