Using AI Precompiles
Citrate provides seven AI-native precompiled contracts that are embedded directly in the Lattice Virtual Machine. We designed these precompiles to execute AI operations at native speed with predictable gas costs, bypassing the overhead of equivalent Solidity implementations. Each precompile sits at a reserved address and exposes a Solidity-compatible interface.
Overview of Precompile Addresses
The seven AI precompiles occupy a reserved address range and are available on both testnet and mainnet from genesis:
| Address | Name | Role |
|---|---|---|
0x0100 | MODEL_DEPLOY | Register, discover, and manage AI models |
0x0101 | MODEL_INFERENCE | Dispatch single inference requests and receive results |
0x0102 | BATCH_INFERENCE | Execute batched inference with amortized gas costs |
0x0103 | MODEL_METADATA | Query on-chain model metadata (read-only) |
0x0104 | PROOF_VERIFY | Verify zero-knowledge proofs of inference results |
0x0105 | MODEL_BENCHMARK | Run standardized benchmarks against registered models |
0x0106 | MODEL_ENCRYPTION | Encrypt or decrypt model weights via secure enclave |
All precompile interfaces use standard ABI encoding. You can call them from Solidity, Vyper, or any language that produces EVM-compatible bytecode.
MODEL_DEPLOY (0x0100)
MODEL_DEPLOY is the canonical on-chain registry of all AI models available on the Citrate network. Model operators register their models here, and consumer contracts query MODEL_METADATA to discover available inference providers.
interface IModelDeploy {
/// Register a new model with metadata.
function deployModel(
string calldata name,
bytes32 modelHash, // SHA-256 of model weights
string calldata storageUri, // IPFS/Arweave URI for weights
string calldata format, // "gguf", "onnx", "safetensors", "mlx"
bytes calldata inputSchema,
bytes calldata outputSchema
) external returns (bytes32 modelId);
}
Gas costs: deployModel costs 100,000 gas plus 16 gas per input byte. Successfully deploying a model earns a 1 SALT bonus reward.
MODEL_INFERENCE (0x0101)
MODEL_INFERENCE is the dispatch layer for on-chain inference requests. When a contract calls runInference, the engine routes the request to the model host, manages the fee escrow, and returns the result synchronously or via callback.
interface IModelInference {
/// Execute a single inference call.
function runInference(
bytes32 modelId,
bytes calldata input,
bool generateProof // Whether to generate a ZK proof
) external returns (bytes memory output, bytes memory proof);
}
Inference uses llama.cpp with Metal GPU acceleration under the hood for GGUF models, and ONNX Runtime for ONNX models. The native model format is GGUF, but CoreML, MLX, TFLite, and PyTorch Mobile are also supported.
Gas costs: 5,000 base + 10 gas per input element. Add 200,000 gas if proof generation is requested.
BATCH_INFERENCE (0x0102)
BATCH_INFERENCE executes multiple inference calls in a single precompile invocation. Batching amortizes the model loading cost across multiple inputs and provides a 20% gas discount versus equivalent individual calls.
interface IBatchInference {
/// Execute batched inference calls.
function batchInference(
bytes32 modelId,
bytes[] calldata inputs,
bool generateProofs
) external returns (bytes[] memory outputs, bytes[] memory proofs);
}
Maximum batch size is 32. Gas costs scale with batch_size * input_size * 6 after the 5,000 base cost.
MODEL_METADATA (0x0103)
MODEL_METADATA provides read-only access to on-chain model information. This is the discovery mechanism for finding available models and their capabilities.
interface IModelMetadata {
/// Query model metadata by ID.
function getModelMetadata(bytes32 modelId)
external view returns (
string memory name,
bytes32 modelHash,
string memory storageUri,
string memory format,
address owner,
uint256 deployBlock,
uint256 inferenceCount,
bytes memory inputSchema,
bytes memory outputSchema
);
}
Gas cost: 2,600 gas (equivalent to a cold SLOAD).
PROOF_VERIFY (0x0104)
PROOF_VERIFY allows on-chain verification of inference proofs. When a node fulfills an inference request with proof generation enabled, it produces a cryptographic attestation proving correct execution. This precompile verifies those proofs efficiently.
interface IProofVerify {
/// Verify a ZK inference proof.
function verifyProof(
bytes calldata proof,
bytes32 modelId,
bytes32 inputHash,
bytes32 outputHash
) external view returns (bool valid);
}
Gas cost: 3,000 base + 16 gas per proof byte.
MODEL_BENCHMARK (0x0105)
MODEL_BENCHMARK runs standardized performance benchmarks against registered models. Results are stored on-chain and used by the mentorship protocol to rank model providers by quality.
interface IModelBenchmark {
/// Benchmark a registered model.
function benchmarkModel(
bytes32 modelId,
bytes calldata benchmarkSuite
) external returns (
uint256 latencyMs,
uint256 throughput,
uint256 accuracy, // Basis points (0-10000)
bytes32 resultHash
);
}
Gas cost: 500,000 gas + model-size-dependent overhead.
MODEL_ENCRYPTION (0x0106)
MODEL_ENCRYPTION enables confidential model deployment by encrypting or decrypting model weights using the node's secure enclave.
interface IModelEncryption {
/// Encrypt or decrypt model weights.
function encryptModel(
bytes calldata modelWeights,
bytes32 encryptionKey,
bool isEncrypt
) external returns (bytes memory result);
}
Gas cost: 100,000 gas + 32 gas per input byte.
Calling Precompiles from Your Contract
All seven precompiles are called using standard Solidity interface patterns. I'd recommend starting with a simple MODEL_METADATA read call to confirm your setup works, then moving on to inference. Define the interface, cast the precompile address, and call methods as you would any other contract:
// SPDX-License-Identifier: MIT
pragma solidity ^0.8.24;
interface IModelInference {
function runInference(
bytes32 modelId, bytes calldata input, bool generateProof
) external returns (bytes memory output, bytes memory proof);
}
contract SentimentAnalyzer {
IModelInference constant engine = IModelInference(address(0x0101));
mapping(bytes32 => string) public results;
function analyze(bytes32 modelId, string calldata text) external {
(bytes memory output, ) = engine.runInference(
modelId,
abi.encode(text),
false
);
results[keccak256(abi.encode(text))] = abi.decode(output, (string));
}
}
Further Reading
- AI Precompiles (Core Concepts) -- architectural overview
- AI Precompile ABIs -- full ABI reference with gas formulas
- Registering a Model -- step-by-step model registration
- LoRA Adapters -- deep dive on adapter mechanics
- Verifiable Inference -- how results are attested