x402 Payment Protocol

I added x402 support to the inference serving layer to make it possible for non-blockchain applications to consume Citrate inference. A standard HTTP client sends a request to a model serving endpoint. If payment is required, the server returns HTTP 402 with a payment details header. The client pays using SALT tokens and retries. The server verifies on-chain and serves the response.

This means any application... a mobile app, a browser extension, a command-line tool... can access Citrate inference without knowing anything about blockchain. It just needs to handle HTTP 402 responses.

How It Works

Client sends inference request to a model serving endpoint
Server returns HTTP 402 with a X-Payment-Details header containing the SALT amount, recipient address, and payment contract address
Client submits SALT payment on-chain (or via a pre-authorized payment channel)
Client retries with X-Payment-Proof header containing the transaction hash
Server verifies payment on-chain and serves the inference response

Streaming Responses

The x402 protocol supports streaming inference responses. After payment verification, the server opens a Server-Sent Events (SSE) stream. Tokens are streamed as they are generated. The full response cost is charged upfront based on the model's declared per-token pricing.

POST /v1/inference/stream
X-Payment-Proof: 0xabc123...

data: {"token": "The", "index": 0}
data: {"token": " answer", "index": 1}
data: {"token": " is", "index": 2}
data: [DONE]

Multi-Model Routing

Pay once, get routed to the best available model. When a client sends a request to the /v1/inference/routed endpoint, the MCP layer selects the optimal model based on the request parameters, current load, and model performance metrics. The payment covers routing overhead plus inference cost.

Batch Inference

Submit multiple inference requests with a single payment authorization. The batch endpoint accepts an array of prompts and returns an array of responses. Payment is calculated as the sum of individual inference costs with a 5% batch discount.

POST /v1/inference/batch
X-Payment-Proof: 0xabc123...
Content-Type: application/json

{
  "requests": [
    {"prompt": "Summarize this document...", "model": "citrate-7b"},
    {"prompt": "Translate to Spanish...", "model": "citrate-7b"}
  ]
}

Integration

Any HTTP client library can integrate x402 payments. The protocol is designed to be middleware-friendly:

// Example: x402 middleware for fetch
async function x402Fetch(url: string, options: RequestInit) {
  let response = await fetch(url, options);
 
  if (response.status === 402) {
    const paymentDetails = response.headers.get("X-Payment-Details");
    const proof = await submitPayment(JSON.parse(paymentDetails));
 
    response = await fetch(url, {
      ...options,
      headers: {
        ...options.headers,
        "X-Payment-Proof": proof,
      },
    });
  }
 
  return response;
}