- Add synor-compute crate for heterogeneous compute orchestration - Implement processor abstraction for CPU/GPU/TPU/NPU/LPU/FPGA/DSP - Add device registry with cross-vendor capability tracking - Implement task scheduler with work stealing and load balancing - Add energy-aware and latency-aware balancing strategies - Create spot market for compute resources with order matching - Add memory manager with tensor handles and cross-device transfers - Support processor capability profiles (H100, TPU v5p, Groq LPU, etc.) - Implement priority work queues with task decomposition Processor types supported: - CPU (x86-64 AVX512, ARM64 SVE, RISC-V Vector) - GPU (NVIDIA CUDA, AMD ROCm, Intel OneAPI, Apple Metal) - TPU (v2-v5p, Edge TPU) - NPU (Apple Neural Engine, Qualcomm Hexagon, Intel VPU) - LPU (Groq Language Processing Unit) - FPGA (Xilinx, Intel Altera) - DSP (TI, Analog Devices) - WebGPU and WASM runtimes
906 lines
27 KiB
Markdown
906 lines
27 KiB
Markdown
# Phase 11: Synor Compute L2 - Full-Stack Compute Platform
|
|
|
|
> **Mission**: Build a decentralized compute platform capable of AI/ML training, inference, OS hosting, and general-purpose high-performance computing.
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
Synor Compute L2 extends beyond the current WASM-only Synor VM to provide:
|
|
- **GPU Compute**: AI/ML training and inference with CUDA/ROCm support
|
|
- **Container Orchestration**: Docker-compatible workloads with Kubernetes-style scheduling
|
|
- **Persistent VMs**: Long-running virtual machines for OS hosting
|
|
- **Serverless Functions**: Short-lived compute for API backends and event processing
|
|
- **Edge Compute**: Low-latency compute at network edge nodes
|
|
|
|
---
|
|
|
|
## Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ SYNOR COMPUTE L2 │
|
|
├─────────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ APPLICATION LAYER │ │
|
|
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
|
|
│ │ AI/ML │ Serverless │ Containers │ Persistent │ Edge │ │
|
|
│ │ Training │ Functions │ (Docker) │ VMs (Linux) │ Compute │ │
|
|
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ ORCHESTRATION LAYER │ │
|
|
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
|
|
│ │ Job │ Resource │ Network │ Storage │ Health │ │
|
|
│ │ Scheduler │ Manager │ Fabric │ Orchestrator│ Monitor │ │
|
|
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ COMPUTE RUNTIME LAYER │ │
|
|
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
|
|
│ │ GPU │ Container │ MicroVM │ WASM │ Native │ │
|
|
│ │ Runtime │ Runtime │ Runtime │ Runtime │ Runtime │ │
|
|
│ │ (CUDA/ROCm)│ (containerd)│ (Firecracker)│ (Wasmtime) │ (gVisor) │ │
|
|
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ INFRASTRUCTURE LAYER │ │
|
|
│ ├──────────────┬──────────────┬──────────────┬──────────────┬────────────┤ │
|
|
│ │ Node │ Network │ Distributed │ Consensus │ Billing │ │
|
|
│ │ Registry │ Overlay │ Storage │ (PoS+PoW) │ Metering │ │
|
|
│ └──────────────┴──────────────┴──────────────┴──────────────┴────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
|
|
│ │ SYNOR L1 BLOCKCHAIN (GHOSTDAG + DAG-RIDER) │ │
|
|
│ └─────────────────────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Milestone 1: GPU Compute Foundation (AI/ML Training & Inference)
|
|
|
|
### 1.1 GPU Node Registration
|
|
|
|
```rust
|
|
// synor-compute/src/gpu/node.rs
|
|
|
|
/// GPU node capabilities
|
|
pub struct GpuNode {
|
|
/// Unique node ID
|
|
pub node_id: NodeId,
|
|
/// GPU specifications
|
|
pub gpus: Vec<GpuSpec>,
|
|
/// Total VRAM available (bytes)
|
|
pub total_vram: u64,
|
|
/// Available VRAM (bytes)
|
|
pub available_vram: u64,
|
|
/// CUDA compute capability (e.g., 8.6 for RTX 3090)
|
|
pub cuda_capability: Option<(u8, u8)>,
|
|
/// ROCm version (for AMD)
|
|
pub rocm_version: Option<String>,
|
|
/// Network bandwidth (Gbps)
|
|
pub bandwidth_gbps: u32,
|
|
/// Geographic region
|
|
pub region: Region,
|
|
/// Stake amount (for PoS validation)
|
|
pub stake: u64,
|
|
}
|
|
|
|
pub struct GpuSpec {
|
|
pub model: String, // "NVIDIA RTX 4090"
|
|
pub vram_gb: u32, // 24
|
|
pub tensor_cores: u32, // 512
|
|
pub cuda_cores: u32, // 16384
|
|
pub memory_bandwidth: u32, // 1008 GB/s
|
|
pub fp32_tflops: f32, // 82.6
|
|
pub fp16_tflops: f32, // 165.2
|
|
pub int8_tops: f32, // 330.4
|
|
}
|
|
```
|
|
|
|
### 1.2 AI/ML Job Specification
|
|
|
|
```rust
|
|
// synor-compute/src/ai/job.rs
|
|
|
|
/// AI/ML training job specification
|
|
pub struct TrainingJob {
|
|
/// Job ID
|
|
pub job_id: JobId,
|
|
/// Owner address
|
|
pub owner: Address,
|
|
/// Framework (PyTorch, TensorFlow, JAX)
|
|
pub framework: MlFramework,
|
|
/// Model specification
|
|
pub model: ModelSpec,
|
|
/// Dataset reference (Synor Storage CID)
|
|
pub dataset_cid: Cid,
|
|
/// Training configuration
|
|
pub config: TrainingConfig,
|
|
/// Resource requirements
|
|
pub resources: GpuResources,
|
|
/// Maximum budget (SYNOR tokens)
|
|
pub max_budget: u64,
|
|
/// Checkpoint interval (steps)
|
|
pub checkpoint_interval: u64,
|
|
}
|
|
|
|
pub struct GpuResources {
|
|
pub min_gpus: u32,
|
|
pub max_gpus: u32,
|
|
pub min_vram_per_gpu: u64,
|
|
pub cuda_capability_min: Option<(u8, u8)>,
|
|
pub distributed: bool, // Multi-node training
|
|
pub priority: JobPriority,
|
|
}
|
|
|
|
pub enum MlFramework {
|
|
PyTorch { version: String },
|
|
TensorFlow { version: String },
|
|
JAX { version: String },
|
|
ONNX,
|
|
Custom { image: String },
|
|
}
|
|
|
|
pub struct TrainingConfig {
|
|
pub epochs: u32,
|
|
pub batch_size: u32,
|
|
pub learning_rate: f32,
|
|
pub optimizer: String,
|
|
pub mixed_precision: bool,
|
|
pub gradient_accumulation: u32,
|
|
pub distributed_strategy: DistributedStrategy,
|
|
}
|
|
|
|
pub enum DistributedStrategy {
|
|
DataParallel,
|
|
ModelParallel,
|
|
PipelineParallel,
|
|
ZeRO { stage: u8 }, // DeepSpeed ZeRO stages 1-3
|
|
FSDP, // Fully Sharded Data Parallel
|
|
}
|
|
```
|
|
|
|
### 1.3 Inference Service
|
|
|
|
```rust
|
|
// synor-compute/src/ai/inference.rs
|
|
|
|
/// Inference endpoint specification
|
|
pub struct InferenceEndpoint {
|
|
/// Endpoint ID
|
|
pub endpoint_id: EndpointId,
|
|
/// Model reference (Synor Storage CID)
|
|
pub model_cid: Cid,
|
|
/// Model format
|
|
pub format: ModelFormat,
|
|
/// Scaling configuration
|
|
pub scaling: AutoscaleConfig,
|
|
/// GPU requirements per replica
|
|
pub gpu_per_replica: GpuResources,
|
|
/// Request timeout
|
|
pub timeout_ms: u32,
|
|
/// Max batch size for batching inference
|
|
pub max_batch_size: u32,
|
|
/// Batching timeout
|
|
pub batch_timeout_ms: u32,
|
|
}
|
|
|
|
pub enum ModelFormat {
|
|
PyTorch,
|
|
ONNX,
|
|
TensorRT,
|
|
Triton,
|
|
vLLM, // For LLM serving
|
|
TGI, // Text Generation Inference
|
|
Custom,
|
|
}
|
|
|
|
pub struct AutoscaleConfig {
|
|
pub min_replicas: u32,
|
|
pub max_replicas: u32,
|
|
pub target_gpu_utilization: f32,
|
|
pub scale_up_threshold: f32,
|
|
pub scale_down_threshold: f32,
|
|
pub cooldown_seconds: u32,
|
|
}
|
|
```
|
|
|
|
### 1.4 Pricing Model for GPU Compute
|
|
|
|
| Resource | Unit | Price (SYNOR/unit) |
|
|
|----------|------|-------------------|
|
|
| GPU (RTX 4090 equivalent) | hour | 0.50 |
|
|
| GPU (A100 80GB equivalent) | hour | 2.00 |
|
|
| GPU (H100 equivalent) | hour | 4.00 |
|
|
| VRAM | GB/hour | 0.01 |
|
|
| Network egress | GB | 0.05 |
|
|
| Storage (hot, NVMe) | GB/month | 0.10 |
|
|
| Inference requests | 1M tokens | 0.10 |
|
|
|
|
---
|
|
|
|
## Milestone 2: Container Orchestration (Docker/Kubernetes-Compatible)
|
|
|
|
### 2.1 Container Runtime
|
|
|
|
```rust
|
|
// synor-compute/src/container/runtime.rs
|
|
|
|
/// Container specification (OCI-compatible)
|
|
pub struct ContainerSpec {
|
|
/// Image reference
|
|
pub image: ImageRef,
|
|
/// Resource limits
|
|
pub resources: ContainerResources,
|
|
/// Environment variables
|
|
pub env: HashMap<String, String>,
|
|
/// Volume mounts
|
|
pub volumes: Vec<VolumeMount>,
|
|
/// Network configuration
|
|
pub network: NetworkConfig,
|
|
/// Security context
|
|
pub security: SecurityContext,
|
|
/// Health check
|
|
pub health_check: Option<HealthCheck>,
|
|
}
|
|
|
|
pub struct ContainerResources {
|
|
pub cpu_cores: f32, // 0.5, 1.0, 2.0, etc.
|
|
pub memory_mb: u64,
|
|
pub gpu: Option<GpuAllocation>,
|
|
pub ephemeral_storage_gb: u32,
|
|
pub network_bandwidth_mbps: u32,
|
|
}
|
|
|
|
pub struct GpuAllocation {
|
|
pub count: u32,
|
|
pub vram_mb: u64,
|
|
pub shared: bool, // Allow GPU sharing via MPS/MIG
|
|
}
|
|
```
|
|
|
|
### 2.2 Service Mesh & Networking
|
|
|
|
```rust
|
|
// synor-compute/src/network/mesh.rs
|
|
|
|
/// Service definition for container orchestration
|
|
pub struct Service {
|
|
pub service_id: ServiceId,
|
|
pub name: String,
|
|
pub containers: Vec<ContainerSpec>,
|
|
pub replicas: ReplicaConfig,
|
|
pub load_balancer: LoadBalancerConfig,
|
|
pub service_mesh: ServiceMeshConfig,
|
|
}
|
|
|
|
pub struct ServiceMeshConfig {
|
|
pub mtls_enabled: bool,
|
|
pub traffic_policy: TrafficPolicy,
|
|
pub circuit_breaker: CircuitBreakerConfig,
|
|
pub retry_policy: RetryPolicy,
|
|
pub rate_limit: Option<RateLimitConfig>,
|
|
}
|
|
|
|
pub struct LoadBalancerConfig {
|
|
pub algorithm: LoadBalancerAlgorithm,
|
|
pub health_check: HealthCheck,
|
|
pub sticky_sessions: bool,
|
|
pub ssl_termination: SslTermination,
|
|
}
|
|
|
|
pub enum LoadBalancerAlgorithm {
|
|
RoundRobin,
|
|
LeastConnections,
|
|
WeightedRoundRobin { weights: Vec<u32> },
|
|
IPHash,
|
|
Random,
|
|
}
|
|
```
|
|
|
|
### 2.3 Container Pricing
|
|
|
|
| Resource | Unit | Price (SYNOR/unit) |
|
|
|----------|------|-------------------|
|
|
| CPU | core/hour | 0.02 |
|
|
| Memory | GB/hour | 0.005 |
|
|
| Ephemeral storage | GB/hour | 0.001 |
|
|
| Network ingress | GB | FREE |
|
|
| Network egress | GB | 0.05 |
|
|
| Load balancer | hour | 0.01 |
|
|
| Static IP | month | 2.00 |
|
|
|
|
---
|
|
|
|
## Milestone 3: Persistent Virtual Machines (OS Hosting)
|
|
|
|
### 3.1 MicroVM Architecture (Firecracker-based)
|
|
|
|
```rust
|
|
// synor-compute/src/vm/microvm.rs
|
|
|
|
/// Virtual machine specification
|
|
pub struct VmSpec {
|
|
/// VM ID
|
|
pub vm_id: VmId,
|
|
/// Owner address
|
|
pub owner: Address,
|
|
/// VM size
|
|
pub size: VmSize,
|
|
/// Boot image
|
|
pub image: VmImage,
|
|
/// Persistent volumes
|
|
pub volumes: Vec<PersistentVolume>,
|
|
/// Network configuration
|
|
pub network: VmNetworkConfig,
|
|
/// SSH keys for access
|
|
pub ssh_keys: Vec<SshPublicKey>,
|
|
/// Cloud-init user data
|
|
pub user_data: Option<String>,
|
|
}
|
|
|
|
pub struct VmSize {
|
|
pub vcpus: u32,
|
|
pub memory_gb: u32,
|
|
pub gpu: Option<GpuPassthrough>,
|
|
pub network_bandwidth_gbps: u32,
|
|
}
|
|
|
|
pub struct GpuPassthrough {
|
|
pub count: u32,
|
|
pub model: GpuModel,
|
|
pub vram_gb: u32,
|
|
}
|
|
|
|
pub enum VmImage {
|
|
/// Pre-built images
|
|
Marketplace { image_id: String, version: String },
|
|
/// Custom image from Synor Storage
|
|
Custom { cid: Cid, format: ImageFormat },
|
|
/// Standard OS images
|
|
Ubuntu { version: String },
|
|
Debian { version: String },
|
|
AlmaLinux { version: String },
|
|
Windows { version: String, license: WindowsLicense },
|
|
}
|
|
|
|
pub struct PersistentVolume {
|
|
pub volume_id: VolumeId,
|
|
pub size_gb: u32,
|
|
pub volume_type: VolumeType,
|
|
pub mount_path: String,
|
|
pub encrypted: bool,
|
|
}
|
|
|
|
pub enum VolumeType {
|
|
/// High-performance NVMe SSD
|
|
NvmeSsd { iops: u32, throughput_mbps: u32 },
|
|
/// Standard SSD
|
|
Ssd,
|
|
/// HDD for archival
|
|
Hdd,
|
|
/// Distributed storage (Synor Storage L2)
|
|
Distributed { replication: u8 },
|
|
}
|
|
```
|
|
|
|
### 3.2 VM Lifecycle Management
|
|
|
|
```rust
|
|
// synor-compute/src/vm/lifecycle.rs
|
|
|
|
pub enum VmState {
|
|
Pending,
|
|
Provisioning,
|
|
Running,
|
|
Stopping,
|
|
Stopped,
|
|
Hibernating,
|
|
Hibernated,
|
|
Migrating,
|
|
Failed,
|
|
Terminated,
|
|
}
|
|
|
|
pub struct VmManager {
|
|
/// Active VMs
|
|
vms: HashMap<VmId, VmInstance>,
|
|
/// Node assignments
|
|
node_assignments: HashMap<VmId, NodeId>,
|
|
/// Live migration coordinator
|
|
migration_coordinator: MigrationCoordinator,
|
|
}
|
|
|
|
impl VmManager {
|
|
/// Start a new VM
|
|
pub async fn create(&self, spec: VmSpec) -> Result<VmId, VmError>;
|
|
|
|
/// Stop a VM (preserves state)
|
|
pub async fn stop(&self, vm_id: &VmId) -> Result<(), VmError>;
|
|
|
|
/// Start a stopped VM
|
|
pub async fn start(&self, vm_id: &VmId) -> Result<(), VmError>;
|
|
|
|
/// Hibernate VM to storage (saves memory state)
|
|
pub async fn hibernate(&self, vm_id: &VmId) -> Result<(), VmError>;
|
|
|
|
/// Live migrate VM to another node
|
|
pub async fn migrate(&self, vm_id: &VmId, target_node: NodeId) -> Result<(), VmError>;
|
|
|
|
/// Resize VM (requires restart)
|
|
pub async fn resize(&self, vm_id: &VmId, new_size: VmSize) -> Result<(), VmError>;
|
|
|
|
/// Snapshot VM state
|
|
pub async fn snapshot(&self, vm_id: &VmId) -> Result<SnapshotId, VmError>;
|
|
|
|
/// Terminate and delete VM
|
|
pub async fn terminate(&self, vm_id: &VmId) -> Result<(), VmError>;
|
|
}
|
|
```
|
|
|
|
### 3.3 VM Pricing
|
|
|
|
| VM Type | vCPUs | Memory | Storage | GPU | Price (SYNOR/month) |
|
|
|---------|-------|--------|---------|-----|---------------------|
|
|
| micro | 1 | 1 GB | 20 GB SSD | - | 5 |
|
|
| small | 2 | 4 GB | 50 GB SSD | - | 15 |
|
|
| medium | 4 | 8 GB | 100 GB SSD | - | 30 |
|
|
| large | 8 | 32 GB | 200 GB SSD | - | 80 |
|
|
| xlarge | 16 | 64 GB | 500 GB NVMe | - | 200 |
|
|
| gpu-small | 8 | 32 GB | 200 GB NVMe | 1x RTX 4090 | 400 |
|
|
| gpu-medium | 16 | 64 GB | 500 GB NVMe | 2x RTX 4090 | 750 |
|
|
| gpu-large | 32 | 128 GB | 1 TB NVMe | 4x A100 80GB | 2500 |
|
|
| gpu-xlarge | 64 | 256 GB | 2 TB NVMe | 8x H100 | 8000 |
|
|
|
|
---
|
|
|
|
## Milestone 4: Serverless Functions (FaaS)
|
|
|
|
### 4.1 Function Specification
|
|
|
|
```rust
|
|
// synor-compute/src/serverless/function.rs
|
|
|
|
/// Serverless function definition
|
|
pub struct Function {
|
|
pub function_id: FunctionId,
|
|
pub owner: Address,
|
|
pub name: String,
|
|
pub runtime: FunctionRuntime,
|
|
pub handler: String,
|
|
pub code: FunctionCode,
|
|
pub resources: FunctionResources,
|
|
pub triggers: Vec<FunctionTrigger>,
|
|
pub environment: HashMap<String, String>,
|
|
pub timeout_ms: u32,
|
|
pub concurrency: ConcurrencyConfig,
|
|
}
|
|
|
|
pub enum FunctionRuntime {
|
|
Node20,
|
|
Node22,
|
|
Python311,
|
|
Python312,
|
|
Rust,
|
|
Go122,
|
|
Java21,
|
|
Dotnet8,
|
|
Ruby33,
|
|
Custom { image: String },
|
|
}
|
|
|
|
pub struct FunctionCode {
|
|
/// Source code CID in Synor Storage
|
|
pub cid: Cid,
|
|
/// Entry point file
|
|
pub entry_point: String,
|
|
/// Dependencies (package.json, requirements.txt, etc.)
|
|
pub dependencies: Option<Cid>,
|
|
}
|
|
|
|
pub struct FunctionResources {
|
|
pub memory_mb: u32, // 128, 256, 512, 1024, 2048, 4096, 8192
|
|
pub cpu_allocation: f32, // Proportional to memory
|
|
pub ephemeral_storage_mb: u32,
|
|
pub gpu: Option<GpuAllocation>,
|
|
}
|
|
|
|
pub enum FunctionTrigger {
|
|
/// HTTP endpoint
|
|
Http { path: String, methods: Vec<HttpMethod> },
|
|
/// Scheduled execution (cron)
|
|
Schedule { cron: String },
|
|
/// Event from message queue
|
|
Queue { queue_name: String },
|
|
/// Storage events
|
|
Storage { bucket: String, events: Vec<StorageEvent> },
|
|
/// Blockchain events
|
|
Blockchain { contract: Address, events: Vec<String> },
|
|
/// Webhook
|
|
Webhook { url: String },
|
|
}
|
|
```
|
|
|
|
### 4.2 Cold Start Optimization
|
|
|
|
```rust
|
|
// synor-compute/src/serverless/warmup.rs
|
|
|
|
/// Function warmup strategies
|
|
pub struct WarmupConfig {
|
|
/// Minimum warm instances
|
|
pub min_instances: u32,
|
|
/// Provisioned concurrency
|
|
pub provisioned_concurrency: u32,
|
|
/// Warmup schedule
|
|
pub warmup_schedule: Option<String>,
|
|
/// Snapshot-based cold start (SnapStart)
|
|
pub snapstart_enabled: bool,
|
|
}
|
|
|
|
pub struct ColdStartOptimizer {
|
|
/// Pre-warmed function pools
|
|
pools: HashMap<FunctionRuntime, WarmPool>,
|
|
/// Snapshot cache
|
|
snapshots: LruCache<FunctionId, FunctionSnapshot>,
|
|
/// Prediction model for scaling
|
|
predictor: ScalingPredictor,
|
|
}
|
|
|
|
impl ColdStartOptimizer {
|
|
/// Get a warm instance or create one
|
|
pub async fn get_instance(&self, function: &Function) -> Result<FunctionInstance, Error> {
|
|
// Try snapshot restore first (< 100ms)
|
|
if let Some(snapshot) = self.snapshots.get(&function.function_id) {
|
|
return self.restore_from_snapshot(snapshot).await;
|
|
}
|
|
|
|
// Try warm pool (< 50ms)
|
|
if let Some(instance) = self.pools.get(&function.runtime)?.get_warm() {
|
|
return Ok(instance);
|
|
}
|
|
|
|
// Cold start (1-5s depending on runtime)
|
|
self.cold_start(function).await
|
|
}
|
|
}
|
|
```
|
|
|
|
### 4.3 Serverless Pricing
|
|
|
|
| Resource | Unit | Price (SYNOR) |
|
|
|----------|------|---------------|
|
|
| Invocations | 1M requests | 0.20 |
|
|
| Duration | GB-second | 0.00001 |
|
|
| Provisioned concurrency | GB-hour | 0.01 |
|
|
| HTTP Gateway | 1M requests | 0.10 |
|
|
| Event bridge | 1M events | 0.50 |
|
|
|
|
---
|
|
|
|
## Milestone 5: Edge Compute
|
|
|
|
### 5.1 Edge Node Architecture
|
|
|
|
```rust
|
|
// synor-compute/src/edge/node.rs
|
|
|
|
/// Edge compute node
|
|
pub struct EdgeNode {
|
|
pub node_id: NodeId,
|
|
pub location: GeoLocation,
|
|
pub capabilities: EdgeCapabilities,
|
|
pub latency_zones: Vec<LatencyZone>,
|
|
pub resources: EdgeResources,
|
|
}
|
|
|
|
pub struct EdgeCapabilities {
|
|
pub wasm_runtime: bool,
|
|
pub container_runtime: bool,
|
|
pub gpu_inference: bool,
|
|
pub video_transcoding: bool,
|
|
pub cdn_cache: bool,
|
|
}
|
|
|
|
pub struct EdgeResources {
|
|
pub cpu_cores: u32,
|
|
pub memory_gb: u32,
|
|
pub storage_gb: u32,
|
|
pub gpu: Option<EdgeGpu>,
|
|
pub bandwidth_gbps: u32,
|
|
}
|
|
|
|
/// Edge function for low-latency compute
|
|
pub struct EdgeFunction {
|
|
pub function_id: FunctionId,
|
|
pub code: WasmModule,
|
|
pub memory_limit: u32,
|
|
pub timeout_ms: u32,
|
|
pub allowed_regions: Vec<Region>,
|
|
}
|
|
```
|
|
|
|
### 5.2 Edge Use Cases
|
|
|
|
```rust
|
|
// synor-compute/src/edge/usecases.rs
|
|
|
|
/// CDN with compute at edge
|
|
pub struct EdgeCdn {
|
|
/// Origin servers
|
|
origins: Vec<Origin>,
|
|
/// Cache rules
|
|
cache_rules: Vec<CacheRule>,
|
|
/// Edge workers for request/response transformation
|
|
workers: Vec<EdgeWorker>,
|
|
}
|
|
|
|
/// Real-time inference at edge
|
|
pub struct EdgeInference {
|
|
/// Model optimized for edge (quantized, pruned)
|
|
model_id: ModelId,
|
|
/// Inference runtime (TensorRT, ONNX Runtime)
|
|
runtime: EdgeInferenceRuntime,
|
|
/// Max batch size
|
|
max_batch: u32,
|
|
/// Target latency
|
|
target_latency_ms: u32,
|
|
}
|
|
|
|
/// Video processing at edge
|
|
pub struct EdgeVideoProcessor {
|
|
/// Transcoding profiles
|
|
profiles: Vec<TranscodingProfile>,
|
|
/// Real-time streaming
|
|
live_streaming: bool,
|
|
/// Adaptive bitrate
|
|
abr_enabled: bool,
|
|
}
|
|
```
|
|
|
|
### 5.3 Edge Pricing
|
|
|
|
| Resource | Unit | Price (SYNOR) |
|
|
|----------|------|---------------|
|
|
| Edge function invocations | 1M | 0.50 |
|
|
| Edge function duration | GB-second | 0.00002 |
|
|
| Edge bandwidth | GB | 0.08 |
|
|
| Edge cache storage | GB/month | 0.02 |
|
|
| Video transcoding | minute | 0.02 |
|
|
|
|
---
|
|
|
|
## Milestone 6: Node Provider Economics
|
|
|
|
### 6.1 Provider Registration
|
|
|
|
```rust
|
|
// synor-compute/src/provider/registration.rs
|
|
|
|
/// Compute provider registration
|
|
pub struct ProviderRegistration {
|
|
pub provider_id: ProviderId,
|
|
pub owner: Address,
|
|
/// Stake required to become provider
|
|
pub stake: u64,
|
|
/// Hardware specifications
|
|
pub hardware: HardwareManifest,
|
|
/// Network connectivity
|
|
pub network: NetworkManifest,
|
|
/// Geographic location
|
|
pub location: GeoLocation,
|
|
/// Availability SLA commitment
|
|
pub sla: SlaCommitment,
|
|
}
|
|
|
|
pub struct HardwareManifest {
|
|
pub cpus: Vec<CpuSpec>,
|
|
pub memory_total_gb: u64,
|
|
pub gpus: Vec<GpuSpec>,
|
|
pub storage: Vec<StorageSpec>,
|
|
pub verified: bool, // Hardware attestation passed
|
|
}
|
|
|
|
pub struct SlaCommitment {
|
|
pub uptime_percent: f32, // 99.9, 99.99, etc.
|
|
pub response_time_ms: u32,
|
|
pub data_durability: f32,
|
|
pub penalty_rate: f32, // Penalty for SLA violation
|
|
}
|
|
```
|
|
|
|
### 6.2 Provider Revenue Model
|
|
|
|
| Revenue Source | Provider Share | Protocol Share |
|
|
|----------------|----------------|----------------|
|
|
| Compute fees | 85% | 15% |
|
|
| Storage fees | 80% | 20% |
|
|
| Network fees | 75% | 25% |
|
|
| SLA bonuses | 100% | 0% |
|
|
| Staking rewards | 100% | 0% |
|
|
|
|
### 6.3 Slashing Conditions
|
|
|
|
| Violation | Penalty |
|
|
|-----------|---------|
|
|
| Downtime > committed SLA | 1% stake per hour |
|
|
| Data loss | 10% stake + compensation |
|
|
| Malicious behavior | 100% stake |
|
|
| False hardware attestation | 50% stake |
|
|
|
|
---
|
|
|
|
## Implementation Timeline
|
|
|
|
### Phase 11.1: Foundation (Weeks 1-4)
|
|
- [ ] Node registration and hardware attestation
|
|
- [ ] Basic job scheduler
|
|
- [ ] WASM runtime integration (existing)
|
|
- [ ] Container runtime (containerd)
|
|
- [ ] Network overlay (WireGuard mesh)
|
|
|
|
### Phase 11.2: GPU Compute (Weeks 5-8)
|
|
- [ ] GPU node registration
|
|
- [ ] NVIDIA driver integration
|
|
- [ ] CUDA runtime support
|
|
- [ ] Basic ML job execution
|
|
- [ ] Model storage integration
|
|
|
|
### Phase 11.3: Container Orchestration (Weeks 9-12)
|
|
- [ ] OCI image support
|
|
- [ ] Service deployment
|
|
- [ ] Load balancing
|
|
- [ ] Auto-scaling
|
|
- [ ] Service mesh (mTLS)
|
|
|
|
### Phase 11.4: Persistent VMs (Weeks 13-16)
|
|
- [ ] MicroVM runtime (Firecracker)
|
|
- [ ] VM lifecycle management
|
|
- [ ] Persistent storage
|
|
- [ ] Live migration
|
|
- [ ] Snapshot/restore
|
|
|
|
### Phase 11.5: Serverless (Weeks 17-20)
|
|
- [ ] Function deployment
|
|
- [ ] Cold start optimization
|
|
- [ ] Event triggers
|
|
- [ ] API gateway
|
|
- [ ] Monitoring/logging
|
|
|
|
### Phase 11.6: Edge Compute (Weeks 21-24)
|
|
- [ ] Edge node registration
|
|
- [ ] Edge function runtime
|
|
- [ ] CDN integration
|
|
- [ ] Edge inference
|
|
- [ ] Global anycast
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
### Isolation Levels
|
|
|
|
| Workload Type | Isolation Technology | Security Level |
|
|
|---------------|---------------------|----------------|
|
|
| WASM | Wasmtime sandbox | High |
|
|
| Serverless | gVisor + seccomp | High |
|
|
| Containers | gVisor or Kata | Medium-High |
|
|
| VMs | Firecracker MicroVM | High |
|
|
| GPU | NVIDIA MIG/MPS | Medium |
|
|
|
|
### Network Security
|
|
|
|
- All inter-node traffic encrypted (WireGuard)
|
|
- mTLS for service-to-service communication
|
|
- Network policies for workload isolation
|
|
- DDoS protection at edge
|
|
|
|
### Data Security
|
|
|
|
- Encryption at rest (AES-256)
|
|
- Encryption in transit (TLS 1.3)
|
|
- Confidential computing support (AMD SEV, Intel SGX)
|
|
- Secure key management (HSM integration)
|
|
|
|
---
|
|
|
|
## API Examples
|
|
|
|
### Deploy AI Training Job
|
|
|
|
```bash
|
|
synor compute train create \
|
|
--framework pytorch \
|
|
--model-config ./model.yaml \
|
|
--dataset synor://datasets/imagenet \
|
|
--gpus 8 \
|
|
--gpu-type h100 \
|
|
--distributed ddp \
|
|
--epochs 100 \
|
|
--checkpoint-interval 1000 \
|
|
--max-budget 1000
|
|
```
|
|
|
|
### Deploy Inference Endpoint
|
|
|
|
```bash
|
|
synor compute inference deploy \
|
|
--model synor://models/llama-70b \
|
|
--format vllm \
|
|
--min-replicas 2 \
|
|
--max-replicas 10 \
|
|
--gpu-per-replica 2 \
|
|
--target-utilization 0.7
|
|
```
|
|
|
|
### Create Persistent VM
|
|
|
|
```bash
|
|
synor compute vm create \
|
|
--name my-dev-server \
|
|
--image ubuntu:22.04 \
|
|
--size gpu-small \
|
|
--volume 100gb:nvme:/data \
|
|
--ssh-key ~/.ssh/id_ed25519.pub \
|
|
--region us-east
|
|
```
|
|
|
|
### Deploy Container Service
|
|
|
|
```bash
|
|
synor compute service deploy \
|
|
--name my-api \
|
|
--image my-registry/my-api:latest \
|
|
--replicas 3 \
|
|
--cpu 2 \
|
|
--memory 4gb \
|
|
--port 8080 \
|
|
--health-check /health \
|
|
--autoscale 2-10
|
|
```
|
|
|
|
### Deploy Serverless Function
|
|
|
|
```bash
|
|
synor compute function deploy \
|
|
--name process-image \
|
|
--runtime python312 \
|
|
--handler main.handler \
|
|
--code ./function \
|
|
--memory 1024 \
|
|
--timeout 30000 \
|
|
--trigger http:/api/process
|
|
```
|
|
|
|
---
|
|
|
|
## Comparison with Existing Synor VM
|
|
|
|
| Feature | Current Synor VM | Synor Compute L2 |
|
|
|---------|------------------|------------------|
|
|
| Runtime | WASM only | WASM, Container, MicroVM |
|
|
| Timeout | 30 seconds | Unlimited (VMs) |
|
|
| Memory | 16 MB max | Up to 256 GB |
|
|
| GPU | ❌ | ✅ Full CUDA/ROCm |
|
|
| Networking | ❌ | ✅ Full TCP/UDP |
|
|
| File I/O | ❌ | ✅ Persistent volumes |
|
|
| Threading | ❌ | ✅ Multi-threaded |
|
|
| AI/ML | ❌ | ✅ Training + Inference |
|
|
| OS Hosting | ❌ | ✅ Full Linux/Windows |
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
1. **Milestone 1**: Implement GPU node registration and attestation
|
|
2. **Milestone 2**: Build basic job scheduler with resource allocation
|
|
3. **Milestone 3**: Integrate containerd for container workloads
|
|
4. **Milestone 4**: Add Firecracker for MicroVM support
|
|
5. **Milestone 5**: Implement serverless function runtime
|
|
6. **Milestone 6**: Deploy edge nodes and CDN integration
|
|
|
|
This plan transforms Synor from a smart contract platform into a full-stack decentralized cloud provider capable of competing with AWS/GCP/Azure while maintaining decentralization and censorship resistance.
|