Phase 11: Synor Compute L2 - Full-Stack Compute Platform
Mission: Build a decentralized compute platform capable of AI/ML training, inference, OS hosting, and general-purpose high-performance computing.
Executive Summary
Synor Compute L2 extends beyond the current WASM-only Synor VM to provide:
- GPU Compute: AI/ML training and inference with CUDA/ROCm support
- Container Orchestration: Docker-compatible workloads with Kubernetes-style scheduling
- Persistent VMs: Long-running virtual machines for OS hosting
- Serverless Functions: Short-lived compute for API backends and event processing
- Edge Compute: Low-latency compute at network edge nodes
Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYNOR COMPUTE L2 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ APPLICATION LAYER │ │
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│ │ AI/ML │ Serverless │ Containers │ Persistent │ Edge │ │
│ │ Training │ Functions │ (Docker) │ VMs (Linux) │ Compute │ │
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATION LAYER │ │
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│ │ Job │ Resource │ Network │ Storage │ Health │ │
│ │ Scheduler │ Manager │ Fabric │ Orchestrator│ Monitor │ │
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ COMPUTE RUNTIME LAYER │ │
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│ │ GPU │ Container │ MicroVM │ WASM │ Native │ │
│ │ Runtime │ Runtime │ Runtime │ Runtime │ Runtime │ │
│ │ (CUDA/ROCm)│ (containerd)│ (Firecracker)│ (Wasmtime) │ (gVisor) │ │
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ INFRASTRUCTURE LAYER │ │
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│ │ Node │ Network │ Distributed │ Consensus │ Billing │ │
│ │ Registry │ Overlay │ Storage │ (PoS+PoW) │ Metering │ │
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ SYNOR L1 BLOCKCHAIN (GHOSTDAG + DAG-RIDER) │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Milestone 1: GPU Compute Foundation (AI/ML Training & Inference)
1.1 GPU Node Registration
// synor-compute/src/gpu/node.rs
/// GPU node capabilities
pub struct GpuNode {
/// Unique node ID
pub node_id: NodeId,
/// GPU specifications
pub gpus: Vec<GpuSpec>,
/// Total VRAM available (bytes)
pub total_vram: u64,
/// Available VRAM (bytes)
pub available_vram: u64,
/// CUDA compute capability (e.g., 8.6 for RTX 3090)
pub cuda_capability: Option<(u8, u8)>,
/// ROCm version (for AMD)
pub rocm_version: Option<String>,
/// Network bandwidth (Gbps)
pub bandwidth_gbps: u32,
/// Geographic region
pub region: Region,
/// Stake amount (for PoS validation)
pub stake: u64,
}
pub struct GpuSpec {
pub model: String, // "NVIDIA RTX 4090"
pub vram_gb: u32, // 24
pub tensor_cores: u32, // 512
pub cuda_cores: u32, // 16384
pub memory_bandwidth: u32, // 1008 GB/s
pub fp32_tflops: f32, // 82.6
pub fp16_tflops: f32, // 165.2
pub int8_tops: f32, // 330.4
}
1.2 AI/ML Job Specification
// synor-compute/src/ai/job.rs
/// AI/ML training job specification
pub struct TrainingJob {
/// Job ID
pub job_id: JobId,
/// Owner address
pub owner: Address,
/// Framework (PyTorch, TensorFlow, JAX)
pub framework: MlFramework,
/// Model specification
pub model: ModelSpec,
/// Dataset reference (Synor Storage CID)
pub dataset_cid: Cid,
/// Training configuration
pub config: TrainingConfig,
/// Resource requirements
pub resources: GpuResources,
/// Maximum budget (SYNOR tokens)
pub max_budget: u64,
/// Checkpoint interval (steps)
pub checkpoint_interval: u64,
}
pub struct GpuResources {
pub min_gpus: u32,
pub max_gpus: u32,
pub min_vram_per_gpu: u64,
pub cuda_capability_min: Option<(u8, u8)>,
pub distributed: bool, // Multi-node training
pub priority: JobPriority,
}
pub enum MlFramework {
PyTorch { version: String },
TensorFlow { version: String },
JAX { version: String },
ONNX,
Custom { image: String },
}
pub struct TrainingConfig {
pub epochs: u32,
pub batch_size: u32,
pub learning_rate: f32,
pub optimizer: String,
pub mixed_precision: bool,
pub gradient_accumulation: u32,
pub distributed_strategy: DistributedStrategy,
}
pub enum DistributedStrategy {
DataParallel,
ModelParallel,
PipelineParallel,
ZeRO { stage: u8 }, // DeepSpeed ZeRO stages 1-3
FSDP, // Fully Sharded Data Parallel
}
1.3 Inference Service
// synor-compute/src/ai/inference.rs
/// Inference endpoint specification
pub struct InferenceEndpoint {
/// Endpoint ID
pub endpoint_id: EndpointId,
/// Model reference (Synor Storage CID)
pub model_cid: Cid,
/// Model format
pub format: ModelFormat,
/// Scaling configuration
pub scaling: AutoscaleConfig,
/// GPU requirements per replica
pub gpu_per_replica: GpuResources,
/// Request timeout
pub timeout_ms: u32,
/// Max batch size for batching inference
pub max_batch_size: u32,
/// Batching timeout
pub batch_timeout_ms: u32,
}
pub enum ModelFormat {
PyTorch,
ONNX,
TensorRT,
Triton,
vLLM, // For LLM serving
TGI, // Text Generation Inference
Custom,
}
pub struct AutoscaleConfig {
pub min_replicas: u32,
pub max_replicas: u32,
pub target_gpu_utilization: f32,
pub scale_up_threshold: f32,
pub scale_down_threshold: f32,
pub cooldown_seconds: u32,
}
1.4 Pricing Model for GPU Compute
| Resource |
Unit |
Price (SYNOR/unit) |
| GPU (RTX 4090 equivalent) |
hour |
0.50 |
| GPU (A100 80GB equivalent) |
hour |
2.00 |
| GPU (H100 equivalent) |
hour |
4.00 |
| VRAM |
GB/hour |
0.01 |
| Network egress |
GB |
0.05 |
| Storage (hot, NVMe) |
GB/month |
0.10 |
| Inference requests |
1M tokens |
0.10 |
Milestone 2: Container Orchestration (Docker/Kubernetes-Compatible)
2.1 Container Runtime
// synor-compute/src/container/runtime.rs
/// Container specification (OCI-compatible)
pub struct ContainerSpec {
/// Image reference
pub image: ImageRef,
/// Resource limits
pub resources: ContainerResources,
/// Environment variables
pub env: HashMap<String, String>,
/// Volume mounts
pub volumes: Vec<VolumeMount>,
/// Network configuration
pub network: NetworkConfig,
/// Security context
pub security: SecurityContext,
/// Health check
pub health_check: Option<HealthCheck>,
}
pub struct ContainerResources {
pub cpu_cores: f32, // 0.5, 1.0, 2.0, etc.
pub memory_mb: u64,
pub gpu: Option<GpuAllocation>,
pub ephemeral_storage_gb: u32,
pub network_bandwidth_mbps: u32,
}
pub struct GpuAllocation {
pub count: u32,
pub vram_mb: u64,
pub shared: bool, // Allow GPU sharing via MPS/MIG
}
2.2 Service Mesh & Networking
// synor-compute/src/network/mesh.rs
/// Service definition for container orchestration
pub struct Service {
pub service_id: ServiceId,
pub name: String,
pub containers: Vec<ContainerSpec>,
pub replicas: ReplicaConfig,
pub load_balancer: LoadBalancerConfig,
pub service_mesh: ServiceMeshConfig,
}
pub struct ServiceMeshConfig {
pub mtls_enabled: bool,
pub traffic_policy: TrafficPolicy,
pub circuit_breaker: CircuitBreakerConfig,
pub retry_policy: RetryPolicy,
pub rate_limit: Option<RateLimitConfig>,
}
pub struct LoadBalancerConfig {
pub algorithm: LoadBalancerAlgorithm,
pub health_check: HealthCheck,
pub sticky_sessions: bool,
pub ssl_termination: SslTermination,
}
pub enum LoadBalancerAlgorithm {
RoundRobin,
LeastConnections,
WeightedRoundRobin { weights: Vec<u32> },
IPHash,
Random,
}
2.3 Container Pricing
| Resource |
Unit |
Price (SYNOR/unit) |
| CPU |
core/hour |
0.02 |
| Memory |
GB/hour |
0.005 |
| Ephemeral storage |
GB/hour |
0.001 |
| Network ingress |
GB |
FREE |
| Network egress |
GB |
0.05 |
| Load balancer |
hour |
0.01 |
| Static IP |
month |
2.00 |
Milestone 3: Persistent Virtual Machines (OS Hosting)
3.1 MicroVM Architecture (Firecracker-based)
// synor-compute/src/vm/microvm.rs
/// Virtual machine specification
pub struct VmSpec {
/// VM ID
pub vm_id: VmId,
/// Owner address
pub owner: Address,
/// VM size
pub size: VmSize,
/// Boot image
pub image: VmImage,
/// Persistent volumes
pub volumes: Vec<PersistentVolume>,
/// Network configuration
pub network: VmNetworkConfig,
/// SSH keys for access
pub ssh_keys: Vec<SshPublicKey>,
/// Cloud-init user data
pub user_data: Option<String>,
}
pub struct VmSize {
pub vcpus: u32,
pub memory_gb: u32,
pub gpu: Option<GpuPassthrough>,
pub network_bandwidth_gbps: u32,
}
pub struct GpuPassthrough {
pub count: u32,
pub model: GpuModel,
pub vram_gb: u32,
}
pub enum VmImage {
/// Pre-built images
Marketplace { image_id: String, version: String },
/// Custom image from Synor Storage
Custom { cid: Cid, format: ImageFormat },
/// Standard OS images
Ubuntu { version: String },
Debian { version: String },
AlmaLinux { version: String },
Windows { version: String, license: WindowsLicense },
}
pub struct PersistentVolume {
pub volume_id: VolumeId,
pub size_gb: u32,
pub volume_type: VolumeType,
pub mount_path: String,
pub encrypted: bool,
}
pub enum VolumeType {
/// High-performance NVMe SSD
NvmeSsd { iops: u32, throughput_mbps: u32 },
/// Standard SSD
Ssd,
/// HDD for archival
Hdd,
/// Distributed storage (Synor Storage L2)
Distributed { replication: u8 },
}
3.2 VM Lifecycle Management
// synor-compute/src/vm/lifecycle.rs
pub enum VmState {
Pending,
Provisioning,
Running,
Stopping,
Stopped,
Hibernating,
Hibernated,
Migrating,
Failed,
Terminated,
}
pub struct VmManager {
/// Active VMs
vms: HashMap<VmId, VmInstance>,
/// Node assignments
node_assignments: HashMap<VmId, NodeId>,
/// Live migration coordinator
migration_coordinator: MigrationCoordinator,
}
impl VmManager {
/// Start a new VM
pub async fn create(&self, spec: VmSpec) -> Result<VmId, VmError>;
/// Stop a VM (preserves state)
pub async fn stop(&self, vm_id: &VmId) -> Result<(), VmError>;
/// Start a stopped VM
pub async fn start(&self, vm_id: &VmId) -> Result<(), VmError>;
/// Hibernate VM to storage (saves memory state)
pub async fn hibernate(&self, vm_id: &VmId) -> Result<(), VmError>;
/// Live migrate VM to another node
pub async fn migrate(&self, vm_id: &VmId, target_node: NodeId) -> Result<(), VmError>;
/// Resize VM (requires restart)
pub async fn resize(&self, vm_id: &VmId, new_size: VmSize) -> Result<(), VmError>;
/// Snapshot VM state
pub async fn snapshot(&self, vm_id: &VmId) -> Result<SnapshotId, VmError>;
/// Terminate and delete VM
pub async fn terminate(&self, vm_id: &VmId) -> Result<(), VmError>;
}
3.3 VM Pricing
| VM Type |
vCPUs |
Memory |
Storage |
GPU |
Price (SYNOR/month) |
| micro |
1 |
1 GB |
20 GB SSD |
- |
5 |
| small |
2 |
4 GB |
50 GB SSD |
- |
15 |
| medium |
4 |
8 GB |
100 GB SSD |
- |
30 |
| large |
8 |
32 GB |
200 GB SSD |
- |
80 |
| xlarge |
16 |
64 GB |
500 GB NVMe |
- |
200 |
| gpu-small |
8 |
32 GB |
200 GB NVMe |
1x RTX 4090 |
400 |
| gpu-medium |
16 |
64 GB |
500 GB NVMe |
2x RTX 4090 |
750 |
| gpu-large |
32 |
128 GB |
1 TB NVMe |
4x A100 80GB |
2500 |
| gpu-xlarge |
64 |
256 GB |
2 TB NVMe |
8x H100 |
8000 |
Milestone 4: Serverless Functions (FaaS)
4.1 Function Specification
// synor-compute/src/serverless/function.rs
/// Serverless function definition
pub struct Function {
pub function_id: FunctionId,
pub owner: Address,
pub name: String,
pub runtime: FunctionRuntime,
pub handler: String,
pub code: FunctionCode,
pub resources: FunctionResources,
pub triggers: Vec<FunctionTrigger>,
pub environment: HashMap<String, String>,
pub timeout_ms: u32,
pub concurrency: ConcurrencyConfig,
}
pub enum FunctionRuntime {
Node20,
Node22,
Python311,
Python312,
Rust,
Go122,
Java21,
Dotnet8,
Ruby33,
Custom { image: String },
}
pub struct FunctionCode {
/// Source code CID in Synor Storage
pub cid: Cid,
/// Entry point file
pub entry_point: String,
/// Dependencies (package.json, requirements.txt, etc.)
pub dependencies: Option<Cid>,
}
pub struct FunctionResources {
pub memory_mb: u32, // 128, 256, 512, 1024, 2048, 4096, 8192
pub cpu_allocation: f32, // Proportional to memory
pub ephemeral_storage_mb: u32,
pub gpu: Option<GpuAllocation>,
}
pub enum FunctionTrigger {
/// HTTP endpoint
Http { path: String, methods: Vec<HttpMethod> },
/// Scheduled execution (cron)
Schedule { cron: String },
/// Event from message queue
Queue { queue_name: String },
/// Storage events
Storage { bucket: String, events: Vec<StorageEvent> },
/// Blockchain events
Blockchain { contract: Address, events: Vec<String> },
/// Webhook
Webhook { url: String },
}
4.2 Cold Start Optimization
// synor-compute/src/serverless/warmup.rs
/// Function warmup strategies
pub struct WarmupConfig {
/// Minimum warm instances
pub min_instances: u32,
/// Provisioned concurrency
pub provisioned_concurrency: u32,
/// Warmup schedule
pub warmup_schedule: Option<String>,
/// Snapshot-based cold start (SnapStart)
pub snapstart_enabled: bool,
}
pub struct ColdStartOptimizer {
/// Pre-warmed function pools
pools: HashMap<FunctionRuntime, WarmPool>,
/// Snapshot cache
snapshots: LruCache<FunctionId, FunctionSnapshot>,
/// Prediction model for scaling
predictor: ScalingPredictor,
}
impl ColdStartOptimizer {
/// Get a warm instance or create one
pub async fn get_instance(&self, function: &Function) -> Result<FunctionInstance, Error> {
// Try snapshot restore first (< 100ms)
if let Some(snapshot) = self.snapshots.get(&function.function_id) {
return self.restore_from_snapshot(snapshot).await;
}
// Try warm pool (< 50ms)
if let Some(instance) = self.pools.get(&function.runtime)?.get_warm() {
return Ok(instance);
}
// Cold start (1-5s depending on runtime)
self.cold_start(function).await
}
}
4.3 Serverless Pricing
| Resource |
Unit |
Price (SYNOR) |
| Invocations |
1M requests |
0.20 |
| Duration |
GB-second |
0.00001 |
| Provisioned concurrency |
GB-hour |
0.01 |
| HTTP Gateway |
1M requests |
0.10 |
| Event bridge |
1M events |
0.50 |
Milestone 5: Edge Compute
5.1 Edge Node Architecture
// synor-compute/src/edge/node.rs
/// Edge compute node
pub struct EdgeNode {
pub node_id: NodeId,
pub location: GeoLocation,
pub capabilities: EdgeCapabilities,
pub latency_zones: Vec<LatencyZone>,
pub resources: EdgeResources,
}
pub struct EdgeCapabilities {
pub wasm_runtime: bool,
pub container_runtime: bool,
pub gpu_inference: bool,
pub video_transcoding: bool,
pub cdn_cache: bool,
}
pub struct EdgeResources {
pub cpu_cores: u32,
pub memory_gb: u32,
pub storage_gb: u32,
pub gpu: Option<EdgeGpu>,
pub bandwidth_gbps: u32,
}
/// Edge function for low-latency compute
pub struct EdgeFunction {
pub function_id: FunctionId,
pub code: WasmModule,
pub memory_limit: u32,
pub timeout_ms: u32,
pub allowed_regions: Vec<Region>,
}
5.2 Edge Use Cases
// synor-compute/src/edge/usecases.rs
/// CDN with compute at edge
pub struct EdgeCdn {
/// Origin servers
origins: Vec<Origin>,
/// Cache rules
cache_rules: Vec<CacheRule>,
/// Edge workers for request/response transformation
workers: Vec<EdgeWorker>,
}
/// Real-time inference at edge
pub struct EdgeInference {
/// Model optimized for edge (quantized, pruned)
model_id: ModelId,
/// Inference runtime (TensorRT, ONNX Runtime)
runtime: EdgeInferenceRuntime,
/// Max batch size
max_batch: u32,
/// Target latency
target_latency_ms: u32,
}
/// Video processing at edge
pub struct EdgeVideoProcessor {
/// Transcoding profiles
profiles: Vec<TranscodingProfile>,
/// Real-time streaming
live_streaming: bool,
/// Adaptive bitrate
abr_enabled: bool,
}
5.3 Edge Pricing
| Resource |
Unit |
Price (SYNOR) |
| Edge function invocations |
1M |
0.50 |
| Edge function duration |
GB-second |
0.00002 |
| Edge bandwidth |
GB |
0.08 |
| Edge cache storage |
GB/month |
0.02 |
| Video transcoding |
minute |
0.02 |
Milestone 6: Node Provider Economics
6.1 Provider Registration
// synor-compute/src/provider/registration.rs
/// Compute provider registration
pub struct ProviderRegistration {
pub provider_id: ProviderId,
pub owner: Address,
/// Stake required to become provider
pub stake: u64,
/// Hardware specifications
pub hardware: HardwareManifest,
/// Network connectivity
pub network: NetworkManifest,
/// Geographic location
pub location: GeoLocation,
/// Availability SLA commitment
pub sla: SlaCommitment,
}
pub struct HardwareManifest {
pub cpus: Vec<CpuSpec>,
pub memory_total_gb: u64,
pub gpus: Vec<GpuSpec>,
pub storage: Vec<StorageSpec>,
pub verified: bool, // Hardware attestation passed
}
pub struct SlaCommitment {
pub uptime_percent: f32, // 99.9, 99.99, etc.
pub response_time_ms: u32,
pub data_durability: f32,
pub penalty_rate: f32, // Penalty for SLA violation
}
6.2 Provider Revenue Model
| Revenue Source |
Provider Share |
Protocol Share |
| Compute fees |
85% |
15% |
| Storage fees |
80% |
20% |
| Network fees |
75% |
25% |
| SLA bonuses |
100% |
0% |
| Staking rewards |
100% |
0% |
6.3 Slashing Conditions
| Violation |
Penalty |
| Downtime > committed SLA |
1% stake per hour |
| Data loss |
10% stake + compensation |
| Malicious behavior |
100% stake |
| False hardware attestation |
50% stake |
Implementation Timeline
Phase 11.1: Foundation (Weeks 1-4)
Phase 11.2: GPU Compute (Weeks 5-8)
Phase 11.3: Container Orchestration (Weeks 9-12)
Phase 11.4: Persistent VMs (Weeks 13-16)
Phase 11.5: Serverless (Weeks 17-20)
Phase 11.6: Edge Compute (Weeks 21-24)
Security Considerations
Isolation Levels
| Workload Type |
Isolation Technology |
Security Level |
| WASM |
Wasmtime sandbox |
High |
| Serverless |
gVisor + seccomp |
High |
| Containers |
gVisor or Kata |
Medium-High |
| VMs |
Firecracker MicroVM |
High |
| GPU |
NVIDIA MIG/MPS |
Medium |
Network Security
- All inter-node traffic encrypted (WireGuard)
- mTLS for service-to-service communication
- Network policies for workload isolation
- DDoS protection at edge
Data Security
- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- Confidential computing support (AMD SEV, Intel SGX)
- Secure key management (HSM integration)
API Examples
Deploy AI Training Job
synor compute train create \
--framework pytorch \
--model-config ./model.yaml \
--dataset synor://datasets/imagenet \
--gpus 8 \
--gpu-type h100 \
--distributed ddp \
--epochs 100 \
--checkpoint-interval 1000 \
--max-budget 1000
Deploy Inference Endpoint
synor compute inference deploy \
--model synor://models/llama-70b \
--format vllm \
--min-replicas 2 \
--max-replicas 10 \
--gpu-per-replica 2 \
--target-utilization 0.7
Create Persistent VM
synor compute vm create \
--name my-dev-server \
--image ubuntu:22.04 \
--size gpu-small \
--volume 100gb:nvme:/data \
--ssh-key ~/.ssh/id_ed25519.pub \
--region us-east
Deploy Container Service
synor compute service deploy \
--name my-api \
--image my-registry/my-api:latest \
--replicas 3 \
--cpu 2 \
--memory 4gb \
--port 8080 \
--health-check /health \
--autoscale 2-10
Deploy Serverless Function
synor compute function deploy \
--name process-image \
--runtime python312 \
--handler main.handler \
--code ./function \
--memory 1024 \
--timeout 30000 \
--trigger http:/api/process
Comparison with Existing Synor VM
| Feature |
Current Synor VM |
Synor Compute L2 |
| Runtime |
WASM only |
WASM, Container, MicroVM |
| Timeout |
30 seconds |
Unlimited (VMs) |
| Memory |
16 MB max |
Up to 256 GB |
| GPU |
❌ |
✅ Full CUDA/ROCm |
| Networking |
❌ |
✅ Full TCP/UDP |
| File I/O |
❌ |
✅ Persistent volumes |
| Threading |
❌ |
✅ Multi-threaded |
| AI/ML |
❌ |
✅ Training + Inference |
| OS Hosting |
❌ |
✅ Full Linux/Windows |
Next Steps
- Milestone 1: Implement GPU node registration and attestation
- Milestone 2: Build basic job scheduler with resource allocation
- Milestone 3: Integrate containerd for container workloads
- Milestone 4: Add Firecracker for MicroVM support
- Milestone 5: Implement serverless function runtime
- Milestone 6: Deploy edge nodes and CDN integration
This plan transforms Synor from a smart contract platform into a full-stack decentralized cloud provider capable of competing with AWS/GCP/Azure while maintaining decentralization and censorship resistance.