MCP REST API
We built the MCP REST API to be a drop-in replacement for the AI APIs you already use. The Model Context Protocol (MCP) REST API provides a standard HTTP interface for interacting with AI models deployed on the Citrate network. It offers compatibility with both OpenAI and Anthropic API formats, making it straightforward to integrate existing AI applications with on-chain models.
Base URL
http://localhost:8547
The MCP REST API runs on port 8547 by default (configurable via --mcp-port). For production deployments, use your gateway's public URL with TLS enabled.
Authentication
All requests require a Bearer token in the Authorization header:
Authorization: Bearer ctr_sk_live_abc123...
API keys are generated through the Citrate dashboard or via the citrate keys create CLI command. Keys are scoped to specific models or granted global access.
Rate Limits
| Scope | Limit |
|---|---|
| Global | 1,000 req/min |
| Per model | 100 req/min |
| Per key | 500 req/min |
Rate limit headers are included in every response:
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 994
X-RateLimit-Reset: 1709145600
When a rate limit is exceeded, the API returns HTTP 429 with a Retry-After header.
Model Discovery
List Models
GET /v1/models
Returns a paginated list of all models available on the network.
Query Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | int | 20 | Number of models to return |
offset | int | 0 | Pagination offset |
format | string | ... | Filter by model format (onnx, gguf, safetensors) |
owner | string | ... | Filter by owner address |
Response:
{
"object": "list",
"data": [
{
"id": "model_0xabc123",
"object": "model",
"name": "sentiment-v1",
"format": "onnx",
"owner": "0xOwnerAddress",
"created": 1709145600,
"inference_count": 4521,
"permissions": ["inference", "benchmark"]
}
],
"has_more": true,
"total": 142
}
Get Model
GET /v1/models/:model_id
Returns detailed metadata for a single model.
Response:
{
"id": "model_0xabc123",
"object": "model",
"name": "sentiment-v1",
"format": "onnx",
"owner": "0xOwnerAddress",
"created": 1709145600,
"inference_count": 4521,
"model_hash": "0xdef456...",
"storage_uri": "ipfs://Qm...",
"input_schema": { "type": "string", "maxLength": 512 },
"output_schema": { "type": "object" },
"benchmark": {
"latency_ms": 45,
"throughput": 22,
"accuracy_bps": 9420
}
}
Inference
Chat Completions (OpenAI-compatible)
POST /v1/chat/completions
Runs inference using the OpenAI chat completions format. This endpoint is compatible with the OpenAI SDK and any tooling that targets the OpenAI API.
Request Body:
{
"model": "model_0xabc123",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is GhostDAG consensus?" }
],
"temperature": 0.7,
"max_tokens": 256,
"stream": false,
"proof": true
}
Response:
{
"id": "inf_xyz789",
"object": "chat.completion",
"created": 1709145700,
"model": "model_0xabc123",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "GhostDAG is a consensus protocol that generalizes Nakamoto consensus..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 24,
"completion_tokens": 128,
"total_tokens": 152,
"gas_used": "0x7a120"
},
"proof": "0xzkproof..."
}
When stream: true, the response is delivered as Server-Sent Events (SSE) with data: prefixed JSON chunks.
Messages (Anthropic-compatible)
POST /v1/messages
Runs inference using the Anthropic messages format. Compatible with the Anthropic SDK.
Request Body:
{
"model": "model_0xabc123",
"max_tokens": 256,
"messages": [
{ "role": "user", "content": "Explain the Medusa Paradigm." }
]
}
Response:
{
"id": "msg_abc123",
"type": "message",
"role": "assistant",
"content": [
{ "type": "text", "text": "The Medusa Paradigm describes a decentralized coordination model..." }
],
"model": "model_0xabc123",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 98
}
}
Embeddings
POST /v1/embeddings
Generates vector embeddings for the given input text.
Request Body:
{
"model": "model_0xembed01",
"input": "The Citrate blockchain uses GhostDAG consensus.",
"encoding_format": "float"
}
Response:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, "..."]
}
],
"model": "model_0xembed01",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}
Async Jobs
For long-running inference tasks (batch processing, large model inference, benchmarking), use the async job API.
Create Job
POST /v1/jobs
Request Body:
{
"model": "model_0xabc123",
"type": "batch_inference",
"inputs": [
{ "content": "Input text 1" },
{ "content": "Input text 2" },
{ "content": "Input text 3" }
],
"webhook_url": "https://your-app.com/webhooks/citrate"
}
Response:
{
"id": "job_def456",
"object": "job",
"status": "queued",
"created": 1709145800,
"model": "model_0xabc123",
"type": "batch_inference",
"input_count": 3
}
Get Job Status
GET /v1/jobs/:id
Response:
{
"id": "job_def456",
"object": "job",
"status": "completed",
"created": 1709145800,
"completed_at": 1709145830,
"model": "model_0xabc123",
"type": "batch_inference",
"input_count": 3,
"results": [
{ "index": 0, "output": "...", "gas_used": "0x5208" },
{ "index": 1, "output": "...", "gas_used": "0x5208" },
{ "index": 2, "output": "...", "gas_used": "0x5208" }
]
}
Job statuses: queued, processing, completed, failed.
Error Responses
All errors follow a consistent format:
{
"error": {
"type": "invalid_request_error",
"message": "Model model_0xinvalid not found.",
"code": "model_not_found"
}
}
| HTTP Status | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed or missing parameters |
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | API key lacks required permissions |
| 404 | not_found_error | Resource does not exist |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | internal_error | Server-side error |