Model Context Protocol (MCP)

The MCP layer handles model discovery, registration, and serving. Models are stored as GGUF, ONNX, or MLX files pinned to IPFS. The registry contract on-chain maps model IDs to IPFS CIDs and serving addresses. Any node can register a model. Consumers request inference through the registry. The serving node receives the inference fee automatically through the smart contract.

I built the MCP marketplace to be permissionless. You do not need approval to list a model. You do need a staked validator address to participate in the mentor-mentee protocol.

Permissionless Model Registry

Any participant can register an AI model on Citrate. There are no gatekeepers, no approval processes, and no centralized model marketplaces. The on-chain ModelRegistry (at precompile address 0x0100) stores model metadata, while MCP handles the off-chain routing and API translation.

Registration requires:

A content-addressed model hash (IPFS CID or equivalent)
An inference endpoint that conforms to the MCP API specification
A CITE token stake bond (minimum 1,000 CITE)
A declared capability set (text generation, embeddings, image generation, etc.)

Once registered, the model is immediately discoverable by any application on the network.

API-Compatible Endpoints

MCP exposes three core endpoints that mirror industry-standard AI APIs. Any application built against the OpenAI or Anthropic SDKs can point at a Citrate MCP gateway with minimal changes.

Model Discovery

GET /v1/models

Returns all registered models with their capabilities, pricing, and reputation scores. Supports filtering by capability, minimum reputation, and price range.

{
  "data": [
    {
      "id": "citrate-llama-70b",
      "object": "model",
      "owned_by": "0xabc...def",
      "capabilities": ["text-generation", "function-calling"],
      "price_per_token": "0.00001",
      "reputation_score": 0.97
    }
  ]
}

Chat Completions

POST /v1/chat/completions

Routes chat completion requests to the best available model based on the caller's preferences (cost, latency, accuracy). Supports streaming responses.

{
  "model": "citrate-llama-70b",
  "messages": [
    {"role": "user", "content": "Explain GhostDAG consensus."}
  ],
  "stream": true
}

Embeddings

POST /v1/embeddings

Generates vector embeddings using registered embedding models. Compatible with applications that use OpenAI's embedding API.

{
  "model": "citrate-embed-v1",
  "input": "Paraconsistent consensus treats contradictions as learning signals."
}

Inference Routing

When a request arrives at an MCP gateway, the routing engine selects the optimal provider using a multi-factor scoring algorithm:

Model match -- Does the provider serve the requested model (or a compatible variant)?
Reputation -- What is the provider's historical accuracy and uptime, derived from blue score and attestation history?
Latency -- What is the estimated round-trip time based on geographic proximity?
Price -- Does the provider's fee fit within the caller's budget?
Load -- Is the provider currently at capacity?

If the caller specifies a model ID, routing targets that specific model. If the caller specifies only a capability (e.g., "text-generation"), MCP selects the best provider automatically.

Fee Distribution

Inference fees flow through a transparent on-chain settlement:

Recipient	Share	Purpose
Model provider	70%	Compensation for compute
Network validators	20%	Block production incentive
Protocol treasury	10%	Ongoing development funding

Fees are denominated in CITE and settled at each finality checkpoint. Providers can set their own per-token pricing, and the market determines which models attract usage. Our vision for the MCP marketplace is that it should be as open and competitive as possible -- anyone can serve, anyone can consume.

Attestation and Quality

Every inference response includes a cryptographic attestation that can be verified on-chain via the AttestationVerifier precompile. This creates an auditable trail of model performance and enables:

Reputation tracking -- Nodes that consistently produce accurate results build higher reputation scores.
Dispute resolution -- If a response is challenged, the attestation proves what model and input produced it.
Quality guarantees -- Applications can require minimum reputation thresholds for their inference requests.