synor/docs/PLAN/PHASE11-Synor-Compute-L2.md

# Phase 11: Synor Compute L2 - Full-Stack Compute Platform

> **Mission**: Build a decentralized compute platform capable of AI/ML training, inference, OS hosting, and general-purpose high-performance computing.

---

## Executive Summary

Synor Compute L2 extends beyond the current WASM-only Synor VM to provide:
- **GPU Compute**: AI/ML training and inference with CUDA/ROCm support
- **Container Orchestration**: Docker-compatible workloads with Kubernetes-style scheduling
- **Persistent VMs**: Long-running virtual machines for OS hosting
- **Serverless Functions**: Short-lived compute for API backends and event processing
- **Edge Compute**: Low-latency compute at network edge nodes

---

## Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         SYNOR COMPUTE L2                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      APPLICATION LAYER                                   │ │
│  ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│  │   AI/ML      │  Serverless  │  Containers  │  Persistent  │   Edge     │ │
│  │   Training   │  Functions   │  (Docker)    │  VMs (Linux) │  Compute   │ │
│  └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      ORCHESTRATION LAYER                                 │ │
│  ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│  │   Job        │  Resource    │  Network     │  Storage     │   Health   │ │
│  │   Scheduler  │  Manager     │  Fabric      │  Orchestrator│   Monitor  │ │
│  └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      COMPUTE RUNTIME LAYER                               │ │
│  ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│  │   GPU        │  Container   │  MicroVM     │  WASM        │   Native   │ │
│  │   Runtime    │  Runtime     │  Runtime     │  Runtime     │   Runtime  │ │
│  │   (CUDA/ROCm)│  (containerd)│  (Firecracker)│  (Wasmtime) │   (gVisor) │ │
│  └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │                      INFRASTRUCTURE LAYER                                │ │
│  ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
│  │   Node       │  Network     │  Distributed │  Consensus   │   Billing  │ │
│  │   Registry   │  Overlay     │  Storage     │  (PoS+PoW)   │   Metering │ │
│  └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │              SYNOR L1 BLOCKCHAIN (GHOSTDAG + DAG-RIDER)                  │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## Milestone 1: GPU Compute Foundation (AI/ML Training & Inference)

### 1.1 GPU Node Registration

```rust
// synor-compute/src/gpu/node.rs

/// GPU node capabilities
pub struct GpuNode {
    /// Unique node ID
    pub node_id: NodeId,
    /// GPU specifications
    pub gpus: Vec<GpuSpec>,
    /// Total VRAM available (bytes)
    pub total_vram: u64,
    /// Available VRAM (bytes)
    pub available_vram: u64,
    /// CUDA compute capability (e.g., 8.6 for RTX 3090)
    pub cuda_capability: Option<(u8, u8)>,
    /// ROCm version (for AMD)
    pub rocm_version: Option<String>,
    /// Network bandwidth (Gbps)
    pub bandwidth_gbps: u32,
    /// Geographic region
    pub region: Region,
    /// Stake amount (for PoS validation)
    pub stake: u64,
}

pub struct GpuSpec {
    pub model: String,           // "NVIDIA RTX 4090"
    pub vram_gb: u32,            // 24
    pub tensor_cores: u32,       // 512
    pub cuda_cores: u32,         // 16384
    pub memory_bandwidth: u32,   // 1008 GB/s
    pub fp32_tflops: f32,        // 82.6
    pub fp16_tflops: f32,        // 165.2
    pub int8_tops: f32,          // 330.4
}
```

### 1.2 AI/ML Job Specification

```rust
// synor-compute/src/ai/job.rs

/// AI/ML training job specification
pub struct TrainingJob {
    /// Job ID
    pub job_id: JobId,
    /// Owner address
    pub owner: Address,
    /// Framework (PyTorch, TensorFlow, JAX)
    pub framework: MlFramework,
    /// Model specification
    pub model: ModelSpec,
    /// Dataset reference (Synor Storage CID)
    pub dataset_cid: Cid,
    /// Training configuration
    pub config: TrainingConfig,
    /// Resource requirements
    pub resources: GpuResources,
    /// Maximum budget (SYNOR tokens)
    pub max_budget: u64,
    /// Checkpoint interval (steps)
    pub checkpoint_interval: u64,
}

pub struct GpuResources {
    pub min_gpus: u32,
    pub max_gpus: u32,
    pub min_vram_per_gpu: u64,
    pub cuda_capability_min: Option<(u8, u8)>,
    pub distributed: bool,        // Multi-node training
    pub priority: JobPriority,
}

pub enum MlFramework {
    PyTorch { version: String },
    TensorFlow { version: String },
    JAX { version: String },
    ONNX,
    Custom { image: String },
}

pub struct TrainingConfig {
    pub epochs: u32,
    pub batch_size: u32,
    pub learning_rate: f32,
    pub optimizer: String,
    pub mixed_precision: bool,
    pub gradient_accumulation: u32,
    pub distributed_strategy: DistributedStrategy,
}

pub enum DistributedStrategy {
    DataParallel,
    ModelParallel,
    PipelineParallel,
    ZeRO { stage: u8 },  // DeepSpeed ZeRO stages 1-3
    FSDP,                // Fully Sharded Data Parallel
}
```

### 1.3 Inference Service

```rust
// synor-compute/src/ai/inference.rs

/// Inference endpoint specification
pub struct InferenceEndpoint {
    /// Endpoint ID
    pub endpoint_id: EndpointId,
    /// Model reference (Synor Storage CID)
    pub model_cid: Cid,
    /// Model format
    pub format: ModelFormat,
    /// Scaling configuration
    pub scaling: AutoscaleConfig,
    /// GPU requirements per replica
    pub gpu_per_replica: GpuResources,
    /// Request timeout
    pub timeout_ms: u32,
    /// Max batch size for batching inference
    pub max_batch_size: u32,
    /// Batching timeout
    pub batch_timeout_ms: u32,
}

pub enum ModelFormat {
    PyTorch,
    ONNX,
    TensorRT,
    Triton,
    vLLM,          // For LLM serving
    TGI,           // Text Generation Inference
    Custom,
}

pub struct AutoscaleConfig {
    pub min_replicas: u32,
    pub max_replicas: u32,
    pub target_gpu_utilization: f32,
    pub scale_up_threshold: f32,
    pub scale_down_threshold: f32,
    pub cooldown_seconds: u32,
}
```

### 1.4 Pricing Model for GPU Compute

| Resource | Unit | Price (SYNOR/unit) |
|----------|------|-------------------|
| GPU (RTX 4090 equivalent) | hour | 0.50 |
| GPU (A100 80GB equivalent) | hour | 2.00 |
| GPU (H100 equivalent) | hour | 4.00 |
| VRAM | GB/hour | 0.01 |
| Network egress | GB | 0.05 |
| Storage (hot, NVMe) | GB/month | 0.10 |
| Inference requests | 1M tokens | 0.10 |

---

## Milestone 2: Container Orchestration (Docker/Kubernetes-Compatible)

### 2.1 Container Runtime

```rust
// synor-compute/src/container/runtime.rs

/// Container specification (OCI-compatible)
pub struct ContainerSpec {
    /// Image reference
    pub image: ImageRef,
    /// Resource limits
    pub resources: ContainerResources,
    /// Environment variables
    pub env: HashMap<String, String>,
    /// Volume mounts
    pub volumes: Vec<VolumeMount>,
    /// Network configuration
    pub network: NetworkConfig,
    /// Security context
    pub security: SecurityContext,
    /// Health check
    pub health_check: Option<HealthCheck>,
}

pub struct ContainerResources {
    pub cpu_cores: f32,          // 0.5, 1.0, 2.0, etc.
    pub memory_mb: u64,
    pub gpu: Option<GpuAllocation>,
    pub ephemeral_storage_gb: u32,
    pub network_bandwidth_mbps: u32,
}

pub struct GpuAllocation {
    pub count: u32,
    pub vram_mb: u64,
    pub shared: bool,  // Allow GPU sharing via MPS/MIG
}
```

### 2.2 Service Mesh & Networking

```rust
// synor-compute/src/network/mesh.rs

/// Service definition for container orchestration
pub struct Service {
    pub service_id: ServiceId,
    pub name: String,
    pub containers: Vec<ContainerSpec>,
    pub replicas: ReplicaConfig,
    pub load_balancer: LoadBalancerConfig,
    pub service_mesh: ServiceMeshConfig,
}

pub struct ServiceMeshConfig {
    pub mtls_enabled: bool,
    pub traffic_policy: TrafficPolicy,
    pub circuit_breaker: CircuitBreakerConfig,
    pub retry_policy: RetryPolicy,
    pub rate_limit: Option<RateLimitConfig>,
}

pub struct LoadBalancerConfig {
    pub algorithm: LoadBalancerAlgorithm,
    pub health_check: HealthCheck,
    pub sticky_sessions: bool,
    pub ssl_termination: SslTermination,
}

pub enum LoadBalancerAlgorithm {
    RoundRobin,
    LeastConnections,
    WeightedRoundRobin { weights: Vec<u32> },
    IPHash,
    Random,
}
```

### 2.3 Container Pricing

| Resource | Unit | Price (SYNOR/unit) |
|----------|------|-------------------|
| CPU | core/hour | 0.02 |
| Memory | GB/hour | 0.005 |
| Ephemeral storage | GB/hour | 0.001 |
| Network ingress | GB | FREE |
| Network egress | GB | 0.05 |
| Load balancer | hour | 0.01 |
| Static IP | month | 2.00 |

---

## Milestone 3: Persistent Virtual Machines (OS Hosting)

### 3.1 MicroVM Architecture (Firecracker-based)

```rust
// synor-compute/src/vm/microvm.rs

/// Virtual machine specification
pub struct VmSpec {
    /// VM ID
    pub vm_id: VmId,
    /// Owner address
    pub owner: Address,
    /// VM size
    pub size: VmSize,
    /// Boot image
    pub image: VmImage,
    /// Persistent volumes
    pub volumes: Vec<PersistentVolume>,
    /// Network configuration
    pub network: VmNetworkConfig,
    /// SSH keys for access
    pub ssh_keys: Vec<SshPublicKey>,
    /// Cloud-init user data
    pub user_data: Option<String>,
}

pub struct VmSize {
    pub vcpus: u32,
    pub memory_gb: u32,
    pub gpu: Option<GpuPassthrough>,
    pub network_bandwidth_gbps: u32,
}

pub struct GpuPassthrough {
    pub count: u32,
    pub model: GpuModel,
    pub vram_gb: u32,
}

pub enum VmImage {
    /// Pre-built images
    Marketplace { image_id: String, version: String },
    /// Custom image from Synor Storage
    Custom { cid: Cid, format: ImageFormat },
    /// Standard OS images
    Ubuntu { version: String },
    Debian { version: String },
    AlmaLinux { version: String },
    Windows { version: String, license: WindowsLicense },
}

pub struct PersistentVolume {
    pub volume_id: VolumeId,
    pub size_gb: u32,
    pub volume_type: VolumeType,
    pub mount_path: String,
    pub encrypted: bool,
}

pub enum VolumeType {
    /// High-performance NVMe SSD
    NvmeSsd { iops: u32, throughput_mbps: u32 },
    /// Standard SSD
    Ssd,
    /// HDD for archival
    Hdd,
    /// Distributed storage (Synor Storage L2)
    Distributed { replication: u8 },
}
```

### 3.2 VM Lifecycle Management

```rust
// synor-compute/src/vm/lifecycle.rs

pub enum VmState {
    Pending,
    Provisioning,
    Running,
    Stopping,
    Stopped,
    Hibernating,
    Hibernated,
    Migrating,
    Failed,
    Terminated,
}

pub struct VmManager {
    /// Active VMs
    vms: HashMap<VmId, VmInstance>,
    /// Node assignments
    node_assignments: HashMap<VmId, NodeId>,
    /// Live migration coordinator
    migration_coordinator: MigrationCoordinator,
}

impl VmManager {
    /// Start a new VM
    pub async fn create(&self, spec: VmSpec) -> Result<VmId, VmError>;

    /// Stop a VM (preserves state)
    pub async fn stop(&self, vm_id: &VmId) -> Result<(), VmError>;

    /// Start a stopped VM
    pub async fn start(&self, vm_id: &VmId) -> Result<(), VmError>;

    /// Hibernate VM to storage (saves memory state)
    pub async fn hibernate(&self, vm_id: &VmId) -> Result<(), VmError>;

    /// Live migrate VM to another node
    pub async fn migrate(&self, vm_id: &VmId, target_node: NodeId) -> Result<(), VmError>;

    /// Resize VM (requires restart)
    pub async fn resize(&self, vm_id: &VmId, new_size: VmSize) -> Result<(), VmError>;

    /// Snapshot VM state
    pub async fn snapshot(&self, vm_id: &VmId) -> Result<SnapshotId, VmError>;

    /// Terminate and delete VM
    pub async fn terminate(&self, vm_id: &VmId) -> Result<(), VmError>;
}
```

### 3.3 VM Pricing

| VM Type | vCPUs | Memory | Storage | GPU | Price (SYNOR/month) |
|---------|-------|--------|---------|-----|---------------------|
| micro | 1 | 1 GB | 20 GB SSD | - | 5 |
| small | 2 | 4 GB | 50 GB SSD | - | 15 |
| medium | 4 | 8 GB | 100 GB SSD | - | 30 |
| large | 8 | 32 GB | 200 GB SSD | - | 80 |
| xlarge | 16 | 64 GB | 500 GB NVMe | - | 200 |
| gpu-small | 8 | 32 GB | 200 GB NVMe | 1x RTX 4090 | 400 |
| gpu-medium | 16 | 64 GB | 500 GB NVMe | 2x RTX 4090 | 750 |
| gpu-large | 32 | 128 GB | 1 TB NVMe | 4x A100 80GB | 2500 |
| gpu-xlarge | 64 | 256 GB | 2 TB NVMe | 8x H100 | 8000 |

---

## Milestone 4: Serverless Functions (FaaS)

### 4.1 Function Specification

```rust
// synor-compute/src/serverless/function.rs

/// Serverless function definition
pub struct Function {
    pub function_id: FunctionId,
    pub owner: Address,
    pub name: String,
    pub runtime: FunctionRuntime,
    pub handler: String,
    pub code: FunctionCode,
    pub resources: FunctionResources,
    pub triggers: Vec<FunctionTrigger>,
    pub environment: HashMap<String, String>,
    pub timeout_ms: u32,
    pub concurrency: ConcurrencyConfig,
}

pub enum FunctionRuntime {
    Node20,
    Node22,
    Python311,
    Python312,
    Rust,
    Go122,
    Java21,
    Dotnet8,
    Ruby33,
    Custom { image: String },
}

pub struct FunctionCode {
    /// Source code CID in Synor Storage
    pub cid: Cid,
    /// Entry point file
    pub entry_point: String,
    /// Dependencies (package.json, requirements.txt, etc.)
    pub dependencies: Option<Cid>,
}

pub struct FunctionResources {
    pub memory_mb: u32,       // 128, 256, 512, 1024, 2048, 4096, 8192
    pub cpu_allocation: f32,  // Proportional to memory
    pub ephemeral_storage_mb: u32,
    pub gpu: Option<GpuAllocation>,
}

pub enum FunctionTrigger {
    /// HTTP endpoint
    Http { path: String, methods: Vec<HttpMethod> },
    /// Scheduled execution (cron)
    Schedule { cron: String },
    /// Event from message queue
    Queue { queue_name: String },
    /// Storage events
    Storage { bucket: String, events: Vec<StorageEvent> },
    /// Blockchain events
    Blockchain { contract: Address, events: Vec<String> },
    /// Webhook
    Webhook { url: String },
}
```

### 4.2 Cold Start Optimization

```rust
// synor-compute/src/serverless/warmup.rs

/// Function warmup strategies
pub struct WarmupConfig {
    /// Minimum warm instances
    pub min_instances: u32,
    /// Provisioned concurrency
    pub provisioned_concurrency: u32,
    /// Warmup schedule
    pub warmup_schedule: Option<String>,
    /// Snapshot-based cold start (SnapStart)
    pub snapstart_enabled: bool,
}

pub struct ColdStartOptimizer {
    /// Pre-warmed function pools
    pools: HashMap<FunctionRuntime, WarmPool>,
    /// Snapshot cache
    snapshots: LruCache<FunctionId, FunctionSnapshot>,
    /// Prediction model for scaling
    predictor: ScalingPredictor,
}

impl ColdStartOptimizer {
    /// Get a warm instance or create one
    pub async fn get_instance(&self, function: &Function) -> Result<FunctionInstance, Error> {
        // Try snapshot restore first (< 100ms)
        if let Some(snapshot) = self.snapshots.get(&function.function_id) {
            return self.restore_from_snapshot(snapshot).await;
        }

        // Try warm pool (< 50ms)
        if let Some(instance) = self.pools.get(&function.runtime)?.get_warm() {
            return Ok(instance);
        }

        // Cold start (1-5s depending on runtime)
        self.cold_start(function).await
    }
}
```

### 4.3 Serverless Pricing

| Resource | Unit | Price (SYNOR) |
|----------|------|---------------|
| Invocations | 1M requests | 0.20 |
| Duration | GB-second | 0.00001 |
| Provisioned concurrency | GB-hour | 0.01 |
| HTTP Gateway | 1M requests | 0.10 |
| Event bridge | 1M events | 0.50 |

---

## Milestone 5: Edge Compute

### 5.1 Edge Node Architecture

```rust
// synor-compute/src/edge/node.rs

/// Edge compute node
pub struct EdgeNode {
    pub node_id: NodeId,
    pub location: GeoLocation,
    pub capabilities: EdgeCapabilities,
    pub latency_zones: Vec<LatencyZone>,
    pub resources: EdgeResources,
}

pub struct EdgeCapabilities {
    pub wasm_runtime: bool,
    pub container_runtime: bool,
    pub gpu_inference: bool,
    pub video_transcoding: bool,
    pub cdn_cache: bool,
}

pub struct EdgeResources {
    pub cpu_cores: u32,
    pub memory_gb: u32,
    pub storage_gb: u32,
    pub gpu: Option<EdgeGpu>,
    pub bandwidth_gbps: u32,
}

/// Edge function for low-latency compute
pub struct EdgeFunction {
    pub function_id: FunctionId,
    pub code: WasmModule,
    pub memory_limit: u32,
    pub timeout_ms: u32,
    pub allowed_regions: Vec<Region>,
}
```

### 5.2 Edge Use Cases

```rust
// synor-compute/src/edge/usecases.rs

/// CDN with compute at edge
pub struct EdgeCdn {
    /// Origin servers
    origins: Vec<Origin>,
    /// Cache rules
    cache_rules: Vec<CacheRule>,
    /// Edge workers for request/response transformation
    workers: Vec<EdgeWorker>,
}

/// Real-time inference at edge
pub struct EdgeInference {
    /// Model optimized for edge (quantized, pruned)
    model_id: ModelId,
    /// Inference runtime (TensorRT, ONNX Runtime)
    runtime: EdgeInferenceRuntime,
    /// Max batch size
    max_batch: u32,
    /// Target latency
    target_latency_ms: u32,
}

/// Video processing at edge
pub struct EdgeVideoProcessor {
    /// Transcoding profiles
    profiles: Vec<TranscodingProfile>,
    /// Real-time streaming
    live_streaming: bool,
    /// Adaptive bitrate
    abr_enabled: bool,
}
```

### 5.3 Edge Pricing

| Resource | Unit | Price (SYNOR) |
|----------|------|---------------|
| Edge function invocations | 1M | 0.50 |
| Edge function duration | GB-second | 0.00002 |
| Edge bandwidth | GB | 0.08 |
| Edge cache storage | GB/month | 0.02 |
| Video transcoding | minute | 0.02 |

---

## Milestone 6: Node Provider Economics

### 6.1 Provider Registration

```rust
// synor-compute/src/provider/registration.rs

/// Compute provider registration
pub struct ProviderRegistration {
    pub provider_id: ProviderId,
    pub owner: Address,
    /// Stake required to become provider
    pub stake: u64,
    /// Hardware specifications
    pub hardware: HardwareManifest,
    /// Network connectivity
    pub network: NetworkManifest,
    /// Geographic location
    pub location: GeoLocation,
    /// Availability SLA commitment
    pub sla: SlaCommitment,
}

pub struct HardwareManifest {
    pub cpus: Vec<CpuSpec>,
    pub memory_total_gb: u64,
    pub gpus: Vec<GpuSpec>,
    pub storage: Vec<StorageSpec>,
    pub verified: bool,  // Hardware attestation passed
}

pub struct SlaCommitment {
    pub uptime_percent: f32,      // 99.9, 99.99, etc.
    pub response_time_ms: u32,
    pub data_durability: f32,
    pub penalty_rate: f32,        // Penalty for SLA violation
}
```

### 6.2 Provider Revenue Model

| Revenue Source | Provider Share | Protocol Share |
|----------------|----------------|----------------|
| Compute fees | 85% | 15% |
| Storage fees | 80% | 20% |
| Network fees | 75% | 25% |
| SLA bonuses | 100% | 0% |
| Staking rewards | 100% | 0% |

### 6.3 Slashing Conditions

| Violation | Penalty |
|-----------|---------|
| Downtime > committed SLA | 1% stake per hour |
| Data loss | 10% stake + compensation |
| Malicious behavior | 100% stake |
| False hardware attestation | 50% stake |

---

## Implementation Timeline

### Phase 11.1: Foundation (Weeks 1-4)
- [ ] Node registration and hardware attestation
- [ ] Basic job scheduler
- [ ] WASM runtime integration (existing)
- [ ] Container runtime (containerd)
- [ ] Network overlay (WireGuard mesh)

### Phase 11.2: GPU Compute (Weeks 5-8)
- [ ] GPU node registration
- [ ] NVIDIA driver integration
- [ ] CUDA runtime support
- [ ] Basic ML job execution
- [ ] Model storage integration

### Phase 11.3: Container Orchestration (Weeks 9-12)
- [ ] OCI image support
- [ ] Service deployment
- [ ] Load balancing
- [ ] Auto-scaling
- [ ] Service mesh (mTLS)

### Phase 11.4: Persistent VMs (Weeks 13-16)
- [ ] MicroVM runtime (Firecracker)
- [ ] VM lifecycle management
- [ ] Persistent storage
- [ ] Live migration
- [ ] Snapshot/restore

### Phase 11.5: Serverless (Weeks 17-20)
- [ ] Function deployment
- [ ] Cold start optimization
- [ ] Event triggers
- [ ] API gateway
- [ ] Monitoring/logging

### Phase 11.6: Edge Compute (Weeks 21-24)
- [ ] Edge node registration
- [ ] Edge function runtime
- [ ] CDN integration
- [ ] Edge inference
- [ ] Global anycast

---

## Security Considerations

### Isolation Levels

| Workload Type | Isolation Technology | Security Level |
|---------------|---------------------|----------------|
| WASM | Wasmtime sandbox | High |
| Serverless | gVisor + seccomp | High |
| Containers | gVisor or Kata | Medium-High |
| VMs | Firecracker MicroVM | High |
| GPU | NVIDIA MIG/MPS | Medium |

### Network Security

- All inter-node traffic encrypted (WireGuard)
- mTLS for service-to-service communication
- Network policies for workload isolation
- DDoS protection at edge

### Data Security

- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- Confidential computing support (AMD SEV, Intel SGX)
- Secure key management (HSM integration)

---

## API Examples

### Deploy AI Training Job

```bash
synor compute train create \
  --framework pytorch \
  --model-config ./model.yaml \
  --dataset synor://datasets/imagenet \
  --gpus 8 \
  --gpu-type h100 \
  --distributed ddp \
  --epochs 100 \
  --checkpoint-interval 1000 \
  --max-budget 1000
```

### Deploy Inference Endpoint

```bash
synor compute inference deploy \
  --model synor://models/llama-70b \
  --format vllm \
  --min-replicas 2 \
  --max-replicas 10 \
  --gpu-per-replica 2 \
  --target-utilization 0.7
```

### Create Persistent VM

```bash
synor compute vm create \
  --name my-dev-server \
  --image ubuntu:22.04 \
  --size gpu-small \
  --volume 100gb:nvme:/data \
  --ssh-key ~/.ssh/id_ed25519.pub \
  --region us-east
```

### Deploy Container Service

```bash
synor compute service deploy \
  --name my-api \
  --image my-registry/my-api:latest \
  --replicas 3 \
  --cpu 2 \
  --memory 4gb \
  --port 8080 \
  --health-check /health \
  --autoscale 2-10
```

### Deploy Serverless Function

```bash
synor compute function deploy \
  --name process-image \
  --runtime python312 \
  --handler main.handler \
  --code ./function \
  --memory 1024 \
  --timeout 30000 \
  --trigger http:/api/process
```

---

## Comparison with Existing Synor VM

| Feature | Current Synor VM | Synor Compute L2 |
|---------|------------------|------------------|
| Runtime | WASM only | WASM, Container, MicroVM |
| Timeout | 30 seconds | Unlimited (VMs) |
| Memory | 16 MB max | Up to 256 GB |
| GPU | ❌ | ✅ Full CUDA/ROCm |
| Networking | ❌ | ✅ Full TCP/UDP |
| File I/O | ❌ | ✅ Persistent volumes |
| Threading | ❌ | ✅ Multi-threaded |
| AI/ML | ❌ | ✅ Training + Inference |
| OS Hosting | ❌ | ✅ Full Linux/Windows |

---

## Next Steps

1. **Milestone 1**: Implement GPU node registration and attestation
2. **Milestone 2**: Build basic job scheduler with resource allocation
3. **Milestone 3**: Integrate containerd for container workloads
4. **Milestone 4**: Add Firecracker for MicroVM support
5. **Milestone 5**: Implement serverless function runtime
6. **Milestone 6**: Deploy edge nodes and CDN integration

This plan transforms Synor from a smart contract platform into a full-stack decentralized cloud provider capable of competing with AWS/GCP/Azure while maintaining decentralization and censorship resistance.