x402 Payment Protocol
I added x402 support to the inference serving layer to make it possible for non-blockchain applications to consume Citrate inference. A standard HTTP client sends a request to a model serving endpoint. If payment is required, the server returns HTTP 402 with a payment details header. The client pays using SALT tokens and retries. The server verifies on-chain and serves the response.
This means any application... a mobile app, a browser extension, a command-line tool... can access Citrate inference without knowing anything about blockchain. It just needs to handle HTTP 402 responses.
How It Works
- Client sends inference request to a model serving endpoint
- Server returns HTTP 402 with a
X-Payment-Detailsheader containing the SALT amount, recipient address, and payment contract address - Client submits SALT payment on-chain (or via a pre-authorized payment channel)
- Client retries with
X-Payment-Proofheader containing the transaction hash - Server verifies payment on-chain and serves the inference response
Streaming Responses
The x402 protocol supports streaming inference responses. After payment verification, the server opens a Server-Sent Events (SSE) stream. Tokens are streamed as they are generated. The full response cost is charged upfront based on the model's declared per-token pricing.
POST /v1/inference/stream
X-Payment-Proof: 0xabc123...
data: {"token": "The", "index": 0}
data: {"token": " answer", "index": 1}
data: {"token": " is", "index": 2}
data: [DONE]
Multi-Model Routing
Pay once, get routed to the best available model. When a client sends a request to the /v1/inference/routed endpoint, the MCP layer selects the optimal model based on the request parameters, current load, and model performance metrics. The payment covers routing overhead plus inference cost.
Batch Inference
Submit multiple inference requests with a single payment authorization. The batch endpoint accepts an array of prompts and returns an array of responses. Payment is calculated as the sum of individual inference costs with a 5% batch discount.
POST /v1/inference/batch
X-Payment-Proof: 0xabc123...
Content-Type: application/json
{
"requests": [
{"prompt": "Summarize this document...", "model": "citrate-7b"},
{"prompt": "Translate to Spanish...", "model": "citrate-7b"}
]
}
Integration
Any HTTP client library can integrate x402 payments. The protocol is designed to be middleware-friendly:
// Example: x402 middleware for fetch
async function x402Fetch(url: string, options: RequestInit) {
let response = await fetch(url, options);
if (response.status === 402) {
const paymentDetails = response.headers.get("X-Payment-Details");
const proof = await submitPayment(JSON.parse(paymentDetails));
response = await fetch(url, {
...options,
headers: {
...options.headers,
"X-Payment-Proof": proof,
},
});
}
return response;
}