# Synor Compute SDK for Python

Access distributed heterogeneous compute at 90% cost reduction.

## Installation

```bash
pip install synor-compute
# or
poetry add synor-compute
```

## Quick Start

```python
import asyncio
from synor_compute import SynorCompute, Tensor

async def main():
    client = SynorCompute('your-api-key')

    # Matrix multiplication on GPU
    a = Tensor.random((512, 512))
    b = Tensor.random((512, 512))
    result = await client.matmul(a, b, precision='fp16', processor='gpu')

    print(f"Execution time: {result.execution_time_ms}ms")
    print(f"Cost: ${result.cost}")

asyncio.run(main())
```

## NumPy Integration

```python
import numpy as np
from synor_compute import Tensor

# Create from NumPy
arr = np.random.randn(100, 100).astype(np.float32)
tensor = Tensor.from_numpy(arr)

# Convert back to NumPy
result_np = tensor.numpy()
```

## Tensor Operations

```python
# Create tensors
zeros = Tensor.zeros((3, 3))
ones = Tensor.ones((2, 2))
random = Tensor.random((10, 10))
randn = Tensor.randn((100,))  # Normal distribution

# Operations
reshaped = tensor.reshape((50, 200))
transposed = tensor.T

# Math operations
mean = tensor.mean()
std = tensor.std()
```

## Matrix Operations

```python
# Matrix multiplication
result = await client.matmul(a, b,
    precision='fp16',
    processor='gpu',
    strategy='speed'
)

# 2D Convolution
conv = await client.conv2d(input_tensor, kernel,
    stride=(1, 1),
    padding=(1, 1)
)

# Flash Attention
attention = await client.attention(query, key, value,
    num_heads=8,
    flash=True
)
```

## LLM Inference

```python
# Single response
response = await client.inference(
    'llama-3-70b',
    'Explain quantum computing',
    max_tokens=512,
    temperature=0.7
)
print(response.result)

# Streaming response
async for chunk in client.inference_stream('llama-3-70b', 'Write a poem'):
    print(chunk, end='', flush=True)
```

## Configuration

```python
from synor_compute import SynorCompute, Config

config = Config(
    api_key='your-api-key',
    base_url='https://api.synor.io/compute/v1',
    default_processor='gpu',
    default_precision='fp16',
    default_strategy='balanced',
    timeout=30.0,
    debug=False
)

client = SynorCompute(config)
```

## Synchronous API

For non-async contexts:

```python
from synor_compute import SynorComputeSync

client = SynorComputeSync('your-api-key')
result = client.matmul(a, b)  # Blocking call
```

## Job Management

```python
# Submit async job
job = await client.submit_job('matmul', {'a': a, 'b': b})

# Poll for status
status = await client.get_job_status(job.job_id)

# Wait for completion
result = await client.wait_for_job(job.job_id, timeout=60.0)

# Cancel job
await client.cancel_job(job.job_id)
```

## Error Handling

```python
from synor_compute import SynorError

try:
    result = await client.matmul(a, b)
except SynorError as e:
    print(f"API Error: {e.message} (status: {e.status_code})")
```

## Type Hints

Full type hint support:

```python
from synor_compute.types import (
    ProcessorType,
    Precision,
    BalancingStrategy,
    JobStatus,
    MatMulOptions,
    InferenceOptions,
    JobResult
)
```

## Testing

```bash
pytest
# or
python -m pytest tests/
```

## License

MIT