API Reference
Complete guide to integrating Decisor API into your application
Quick Start
Decisor provides a unified API for accessing multiple AI providers. Use the same OpenAI-compatible format with intelligent routing, caching, and optimization.
https://decisor.io/v1Authentication
All API requests require authentication using a Bearer token in the Authorization header.
Authorization: Bearer sk_live_xxxGet your API key from the API Keys dashboard
Chat Completions
/v1/chat/completionsRequest Body
{
"model": "auto",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
],
"temperature": 0.7,
"max_tokens": 1000,
"routing": {
"prefer": "speed"
}
}model(required)Recommended: Use "auto" for intelligent routing. We automatically select the best model and provider based on your request.
Advanced (Starter+ only): Specify a specific model (e.g., "gpt-4", "claude-3-opus") for explicit model selection. Free plan users must use "auto".
messages(required)Array of message objects with "role" (user/assistant/system) and "content".
temperature(optional)Sampling temperature between 0 and 2. Default: 1
max_tokens(optional)Maximum number of tokens to generate. Default: varies by model
routing(optional)Optional object to control routing behavior.
prefer(optional)Routing preference: "speed", "quality", or "cost". Default: cost-optimized routing.
fallback(optional)Enable automatic fallback to alternative providers if the primary provider fails. Default: true.
Response
{
"id": "chatcmpl-xxx",
"object": "chat.completion",
"created": 1234567890,
"model": "llama-3.1-8b-instant",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 20,
"total_tokens": 30
},
"_optimization": {
"cached": false,
"original_tokens": 15,
"optimized_tokens": 10,
"tokens_saved": 5
},
"_routing": {
"provider": "groq",
"original_model": "auto",
"mapped_model": "llama-3.1-8b-instant",
"reason": "Best speed for this request"
}
}Routing Preferences
Control how requests are routed by specifying your preference for speed, quality, or cost.
By default, the system optimizes for cost. You can override this behavior by setting the routing preference.
Speed Priority
Prioritize fastest response times. Routes to Groq (fastest provider) when speed is preferred.
"prefer": "speed"Quality Priority
Prioritize highest quality responses. Routes to providers with better models for complex tasks.
"prefer": "quality"Cost Priority
Prioritize lowest cost. Routes to Together AI (cheapest provider) when cost is preferred. This is the default behavior.
"prefer": "cost"Example Request:
{
"model": "auto",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"routing": {
"prefer": "quality"
}
}Code Examples
const response = await fetch('https://decisor.io/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'Bearer sk_live_xxx'
},
body: JSON.stringify({
model: 'auto',
messages: [
{ role: 'user', content: 'Hello, how are you?' }
],
routing: {
prefer: 'speed' // 'speed', 'quality', or 'cost'
}
})
});
const data = await response.json();
console.log(data.choices[0].message.content);Available Models
Free Plan: Only "auto" is available. Intelligent routing automatically selects the best model for you.
Starter+ Plans: You can use "auto" or specify any model below for explicit control.
autoIntelligent routing - automatically selects the best provider and model based on your request. Recommended for all users.
llama-3.3-70b-versatileGroq Llama 3.3 70B - Fast inference with high quality (12 cents/1M tokens)
llama-3.1-8b-instantGroq Llama 3.1 8B - Fastest inference (12 cents/1M tokens)
meta-llama/Llama-3.2-3B-Instruct-TurboTogether AI Llama 3.2 3B - Most cost-effective (5.5 cents/1M tokens)
meta-llama/Llama-3.1-8B-InstructTogether AI Llama 3.1 8B - Good balance (10 cents/1M tokens)
accounts/fireworks/models/llama-v3p1-8b-instructFireworks AI Llama 3.1 8B - Fast and reliable (9 cents/1M tokens)
accounts/fireworks/models/llama-v3p3-70b-instructFireworks AI Llama 3.3 70B - High quality for complex tasks (9 cents/1M tokens)
Rate Limits
Rate limits are applied per API key to ensure fair usage and system stability.
Error Handling
| Code | Status | Description |
|---|---|---|
invalid_api_key | 401 | The API key provided is invalid or missing |
missing_model | 400 | The model parameter is required |
invalid_messages | 400 | Messages must be a non-empty array with valid role and content |
rate_limit_exceeded | 429 | Too many requests. Check your rate limit settings |
token_limit_exceeded | 429 | Monthly token limit exceeded. Upgrade your plan |
routing_error | 400 | Unable to route request to a provider |
provider_error | 500 | Error communicating with the AI provider |
Features
Intelligent Routing
Automatically routes requests to the best provider based on your preferences (speed, quality, cost).
Smart Caching
Automatic response caching reduces costs by serving cached responses for identical requests.
Prompt Optimization
Automatically optimizes your prompts to reduce token usage while maintaining quality.
OpenAI Compatible
Drop-in replacement for OpenAI API. No code changes needed - just update the base URL.
Ready to get started?
Create your free account and get 200,000 tokens to start building with Decisor API.