Core concepts

Model Context Protocol (MCP)

The MCP layer handles model discovery, registration, and serving. Models are stored as GGUF, ONNX, or MLX files pinned to IPFS. The registry contract on-chain maps model IDs to IPFS CIDs and serving addresses. Any node can register a model. Consumers request inference through the registry. The serving node receives the inference fee automatically through the smart contract.

I built the MCP marketplace to be permissionless. You do not need approval to list a model. You do need a staked validator address to participate in the mentor-mentee protocol.

Permissionless Model Registry

Any participant can register an AI model on Citrate. There are no gatekeepers, no approval processes, and no centralized model marketplaces. The on-chain ModelRegistry (at precompile address 0x0100) stores model metadata, while MCP handles the off-chain routing and API translation.

Registration requires:

  1. A content-addressed model hash (IPFS CID or equivalent)
  2. An inference endpoint that conforms to the MCP API specification
  3. A CITE token stake bond (minimum 1,000 CITE)
  4. A declared capability set (text generation, embeddings, image generation, etc.)

Once registered, the model is immediately discoverable by any application on the network.

API-Compatible Endpoints

MCP exposes three core endpoints that mirror industry-standard AI APIs. Any application built against the OpenAI or Anthropic SDKs can point at a Citrate MCP gateway with minimal changes.

Model Discovery

GET /v1/models

Returns all registered models with their capabilities, pricing, and reputation scores. Supports filtering by capability, minimum reputation, and price range.

{
  "data": [
    {
      "id": "citrate-llama-70b",
      "object": "model",
      "owned_by": "0xabc...def",
      "capabilities": ["text-generation", "function-calling"],
      "price_per_token": "0.00001",
      "reputation_score": 0.97
    }
  ]
}

Chat Completions

POST /v1/chat/completions

Routes chat completion requests to the best available model based on the caller's preferences (cost, latency, accuracy). Supports streaming responses.

{
  "model": "citrate-llama-70b",
  "messages": [
    {"role": "user", "content": "Explain GhostDAG consensus."}
  ],
  "stream": true
}

Embeddings

POST /v1/embeddings

Generates vector embeddings using registered embedding models. Compatible with applications that use OpenAI's embedding API.

{
  "model": "citrate-embed-v1",
  "input": "Paraconsistent consensus treats contradictions as learning signals."
}

Inference Routing

When a request arrives at an MCP gateway, the routing engine selects the optimal provider using a multi-factor scoring algorithm:

  1. Model match -- Does the provider serve the requested model (or a compatible variant)?
  2. Reputation -- What is the provider's historical accuracy and uptime, derived from blue score and attestation history?
  3. Latency -- What is the estimated round-trip time based on geographic proximity?
  4. Price -- Does the provider's fee fit within the caller's budget?
  5. Load -- Is the provider currently at capacity?

If the caller specifies a model ID, routing targets that specific model. If the caller specifies only a capability (e.g., "text-generation"), MCP selects the best provider automatically.

Fee Distribution

Inference fees flow through a transparent on-chain settlement:

RecipientSharePurpose
Model provider70%Compensation for compute
Network validators20%Block production incentive
Protocol treasury10%Ongoing development funding

Fees are denominated in CITE and settled at each finality checkpoint. Providers can set their own per-token pricing, and the market determines which models attract usage. Our vision for the MCP marketplace is that it should be as open and competitive as possible -- anyone can serve, anyone can consume.

Attestation and Quality

Every inference response includes a cryptographic attestation that can be verified on-chain via the AttestationVerifier precompile. This creates an auditable trail of model performance and enables:

  • Reputation tracking -- Nodes that consistently produce accurate results build higher reputation scores.
  • Dispute resolution -- If a response is challenged, the attestation proves what model and input produced it.
  • Quality guarantees -- Applications can require minimum reputation thresholds for their inference requests.

Further Reading