MCP REST API

We built the MCP REST API to be a drop-in replacement for the AI APIs you already use. The Model Context Protocol (MCP) REST API provides a standard HTTP interface for interacting with AI models deployed on the Citrate network. It offers compatibility with both OpenAI and Anthropic API formats, making it straightforward to integrate existing AI applications with on-chain models.

Base URL

http://localhost:8547

The MCP REST API runs on port 8547 by default (configurable via --mcp-port). For production deployments, use your gateway's public URL with TLS enabled.

Authentication

All requests require a Bearer token in the Authorization header:

Authorization: Bearer ctr_sk_live_abc123...

API keys are generated through the Citrate dashboard or via the citrate keys create CLI command. Keys are scoped to specific models or granted global access.

Rate Limits

Scope	Limit
Global	1,000 req/min
Per model	100 req/min
Per key	500 req/min

Rate limit headers are included in every response:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 994
X-RateLimit-Reset: 1709145600

When a rate limit is exceeded, the API returns HTTP 429 with a Retry-After header.

Model Discovery

List Models

GET /v1/models

Returns a paginated list of all models available on the network.

Query Parameters:

Parameter	Type	Default	Description
`limit`	int	20	Number of models to return
`offset`	int	0	Pagination offset
`format`	string	...	Filter by model format (onnx, gguf, safetensors)
`owner`	string	...	Filter by owner address

Response:

{
  "object": "list",
  "data": [
    {
      "id": "model_0xabc123",
      "object": "model",
      "name": "sentiment-v1",
      "format": "onnx",
      "owner": "0xOwnerAddress",
      "created": 1709145600,
      "inference_count": 4521,
      "permissions": ["inference", "benchmark"]
    }
  ],
  "has_more": true,
  "total": 142
}

Get Model

GET /v1/models/:model_id

Returns detailed metadata for a single model.

Response:

{
  "id": "model_0xabc123",
  "object": "model",
  "name": "sentiment-v1",
  "format": "onnx",
  "owner": "0xOwnerAddress",
  "created": 1709145600,
  "inference_count": 4521,
  "model_hash": "0xdef456...",
  "storage_uri": "ipfs://Qm...",
  "input_schema": { "type": "string", "maxLength": 512 },
  "output_schema": { "type": "object" },
  "benchmark": {
    "latency_ms": 45,
    "throughput": 22,
    "accuracy_bps": 9420
  }
}

Inference

Chat Completions (OpenAI-compatible)

POST /v1/chat/completions

Runs inference using the OpenAI chat completions format. This endpoint is compatible with the OpenAI SDK and any tooling that targets the OpenAI API.

Request Body:

{
  "model": "model_0xabc123",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is GhostDAG consensus?" }
  ],
  "temperature": 0.7,
  "max_tokens": 256,
  "stream": false,
  "proof": true
}

Response:

{
  "id": "inf_xyz789",
  "object": "chat.completion",
  "created": 1709145700,
  "model": "model_0xabc123",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "GhostDAG is a consensus protocol that generalizes Nakamoto consensus..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 24,
    "completion_tokens": 128,
    "total_tokens": 152,
    "gas_used": "0x7a120"
  },
  "proof": "0xzkproof..."
}

When stream: true, the response is delivered as Server-Sent Events (SSE) with data: prefixed JSON chunks.

Messages (Anthropic-compatible)

POST /v1/messages

Runs inference using the Anthropic messages format. Compatible with the Anthropic SDK.

Request Body:

{
  "model": "model_0xabc123",
  "max_tokens": 256,
  "messages": [
    { "role": "user", "content": "Explain the Medusa Paradigm." }
  ]
}

Response:

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    { "type": "text", "text": "The Medusa Paradigm describes a decentralized coordination model..." }
  ],
  "model": "model_0xabc123",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 98
  }
}

Embeddings

POST /v1/embeddings

Generates vector embeddings for the given input text.

Request Body:

{
  "model": "model_0xembed01",
  "input": "The Citrate blockchain uses GhostDAG consensus.",
  "encoding_format": "float"
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, "..."]
    }
  ],
  "model": "model_0xembed01",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Async Jobs

For long-running inference tasks (batch processing, large model inference, benchmarking), use the async job API.

Create Job

POST /v1/jobs

Request Body:

{
  "model": "model_0xabc123",
  "type": "batch_inference",
  "inputs": [
    { "content": "Input text 1" },
    { "content": "Input text 2" },
    { "content": "Input text 3" }
  ],
  "webhook_url": "https://your-app.com/webhooks/citrate"
}

Response:

{
  "id": "job_def456",
  "object": "job",
  "status": "queued",
  "created": 1709145800,
  "model": "model_0xabc123",
  "type": "batch_inference",
  "input_count": 3
}

Get Job Status

GET /v1/jobs/:id

Response:

{
  "id": "job_def456",
  "object": "job",
  "status": "completed",
  "created": 1709145800,
  "completed_at": 1709145830,
  "model": "model_0xabc123",
  "type": "batch_inference",
  "input_count": 3,
  "results": [
    { "index": 0, "output": "...", "gas_used": "0x5208" },
    { "index": 1, "output": "...", "gas_used": "0x5208" },
    { "index": 2, "output": "...", "gas_used": "0x5208" }
  ]
}

Job statuses: queued, processing, completed, failed.

Error Responses

All errors follow a consistent format:

{
  "error": {
    "type": "invalid_request_error",
    "message": "Model model_0xinvalid not found.",
    "code": "model_not_found"
  }
}

HTTP Status	Error Type	Description
400	invalid_request_error	Malformed or missing parameters
401	authentication_error	Invalid or missing API key
403	permission_error	API key lacks required permissions
404	not_found_error	Resource does not exist
429	rate_limit_error	Rate limit exceeded
500	internal_error	Server-side error

MCP REST API

Base URL

Authentication

Rate Limits

Model Discovery

List Models

Get Model

Inference

Chat Completions (OpenAI-compatible)

Messages (Anthropic-compatible)

Embeddings

Async Jobs

Create Job

Get Job Status

Error Responses

On this page