Add English README and cross-link both documentation versions

2026-01-29 02:15:54 +01:00
parent 6c85aaf7a1
commit 4cd777d07f
2 changed files with 464 additions and 1 deletions
--- a/README_EN.md
+++ b/README_EN.md
@@ -0,0 +1,459 @@
+# Whisper API
+
+[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+
+A local Whisper API with GPU acceleration and web admin interface for audio transcription. OpenAI-compatible API with multi-model support.
+
+[🇩🇪 Deutsche Version](README.md) | **🇺🇸 English Version**
+
+## Features
+
+- **OpenAI-compatible API** - Drop-in replacement for OpenAI Whisper API
+- **GPU Accelerated** - Uses NVIDIA GPUs (CUDA) for fast transcription
+- **CPU Fallback** - Automatic switch to CPU when no GPU is available
+- **Multi-Model Support** - Supports all Whisper models (tiny to large-v3)
+- **Model Management** - Download, switch and delete models via Admin Panel
+- **Default: large-v3** - Best quality with your RTX 3090
+- **Web Admin Interface** - API key management, model management and statistics at `/admin`
+- **API Key Authentication** - Secure access control (Environment + Database)
+- **Cross-Platform** - Docker-based, runs on Windows and Linux
+- **Automatic Cleanup** - Logs automatically deleted after 30 days
+- **Persistent Storage** - Models and data in Docker volumes
+
+## Architecture
+
+```
+┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
+│   Client/App    │────▶│   FastAPI App    │────▶│  Whisper GPU    │
+│  (Clawdbot etc) │     │   (Port 8000)    │     │  (large-v3)     │
+└─────────────────┘     └──────────────────┘     └─────────────────┘
+                               │
+                               ▼
+                         ┌──────────────────┐
+                         │  /admin Panel    │
+                         │  - Key Mgmt      │
+                         │  - Models        │
+                         │  - Dashboard     │
+                         └──────────────────┘
+```
+
+## Quick Start
+
+### Prerequisites
+
+- Docker Desktop (Windows) or Docker + docker-compose (Linux)
+- NVIDIA GPU with CUDA support (RTX 3090) - optional, CPU fallback available
+- NVIDIA Container Toolkit installed (for GPU support)
+
+### Installation
+
+1. **Clone repository:**
+```bash
+git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
+cd whisper-api
+```
+
+2. **Configure environment variables:**
+```bash
+cp .env.example .env
+# Edit .env to your needs
+```
+
+3. **Start Docker container:**
+```bash
+docker-compose up -d
+```
+
+4. **First start:**
+   - The `large-v3` model (~3GB) will be downloaded automatically
+   - This may take 5-10 minutes
+   - Check status: `docker-compose logs -f`
+
+### Verification
+
+```bash
+# Health check
+curl http://localhost:8000/health
+
+# API info
+curl http://localhost:8000/v1/models
+```
+
+## API Documentation
+
+### Authentication
+
+All API endpoints (except `/health` and `/admin`) require an API key:
+
+```bash
+Authorization: Bearer sk-your-api-key-here
+```
+
+### Endpoints
+
+#### POST /v1/audio/transcriptions
+
+Transcribes an audio file.
+
+**Request:**
+```bash
+curl -X POST http://localhost:8000/v1/audio/transcriptions \
+  -H "Authorization: Bearer sk-your-api-key" \
+  -H "Content-Type: multipart/form-data" \
+  -F "file=@/path/to/audio.mp3" \
+  -F "model=large-v3" \
+  -F "language=de" \
+  -F "response_format=json"
+```
+
+**Response:**
+```json
+{
+  "text": "Hello World, this is a test."
+}
+```
+
+#### POST /v1/audio/transcriptions (with Timestamps)
+
+**Request:**
+```bash
+curl -X POST http://localhost:8000/v1/audio/transcriptions \
+  -H "Authorization: Bearer sk-your-api-key" \
+  -F "file=@audio.mp3" \
+  -F "timestamp_granularities[]=word" \
+  -F "response_format=verbose_json"
+```
+
+**Response:**
+```json
+{
+  "text": "Hello World",
+  "segments": [
+    {
+      "id": 0,
+      "start": 0.0,
+      "end": 1.5,
+      "text": "Hello World",
+      "words": [
+        {"word": "Hello", "start": 0.0, "end": 0.5},
+        {"word": "World", "start": 0.6, "end": 1.2}
+      ]
+    }
+  ]
+}
+```
+
+#### GET /v1/models
+
+List available models.
+
+#### GET /v1/available-models
+
+List all available Whisper models with download status.
+
+**Response:**
+```json
+{
+  "models": [
+    {
+      "name": "large-v3",
+      "size": "2.88 GB",
+      "description": "Best accuracy",
+      "is_downloaded": true,
+      "is_active": true
+    }
+  ]
+}
+```
+
+#### GET /v1/model-status
+
+Current download status of the model.
+
+**Response:**
+```json
+{
+  "name": "large-v3",
+  "loaded": true,
+  "is_downloading": false,
+  "download_percentage": 100,
+  "status_message": "Model loaded successfully"
+}
+```
+
+#### POST /v1/switch-model
+
+Switch to a different model.
+
+**Request:**
+```bash
+curl -X POST http://localhost:8000/v1/switch-model \
+  -H "Authorization: Bearer sk-your-api-key" \
+  -F "model=base"
+```
+
+#### POST /v1/reload-model
+
+Re-download current model.
+
+#### DELETE /v1/delete-model/{model_name}
+
+Delete a downloaded model.
+
+#### GET /health
+
+Health check with GPU and model status.
+
+**Response:**
+```json
+{
+  "status": "healthy",
+  "model": "large-v3",
+  "gpu": {
+    "available": true,
+    "name": "NVIDIA GeForce RTX 3090",
+    "vram_used_gb": 2.1,
+    "vram_total_gb": 24.0
+  },
+  "model_status": {
+    "loaded": true,
+    "is_downloading": false,
+    "download_percentage": 100
+  }
+}
+```
+
+## Admin Interface
+
+The web interface is accessible at: `http://localhost:8000/admin`
+
+### Login
+
+- **Username:** `admin` (configurable in `.env`)
+- **Password:** `-whisper12510-` (configurable in `.env`)
+
+### Features
+
+- **Dashboard:** Overview of usage, performance statistics, **Model Download Status**
+- **API Keys:** Manage (create, deactivate, delete)
+- **Models:** 
+  - Manage all Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3)
+  - Download, activate and delete models
+  - **CPU/GPU Mode Toggle**
+  - Reload model
+- **Logs:** Detailed transcription logs with filter
+
+## Configuration
+
+### .env.example
+
+```bash
+# Server
+PORT=8000
+HOST=0.0.0.0
+
+# Whisper
+WHISPER_MODEL=large-v3
+WHISPER_DEVICE=cuda  # or 'cpu' for CPU mode
+WHISPER_COMPUTE_TYPE=float16
+
+# Authentication
+# Multiple API keys separated by comma
+API_KEYS=sk-your-first-key,sk-your-second-key
+ADMIN_USER=admin
+ADMIN_PASSWORD=-whisper12510-
+
+# Data retention (days)
+LOG_RETENTION_DAYS=30
+
+# Optional: Sentry for error tracking
+# SENTRY_DSN=https://...
+```
+
+### Docker-Compose Customization
+
+```yaml
+services:
+  whisper-api:
+    # ...
+    environment:
+      - PORT=8000  # Changeable
+      - WHISPER_MODEL=large-v3
+      - WHISPER_DEVICE=cuda  # or 'cpu' for CPU mode
+    volumes:
+      - whisper_models:/app/models    # Persists models (Named Volume)
+      - whisper_data:/app/data        # SQLite database
+      - whisper_uploads:/app/uploads  # Temporary uploads
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+
+volumes:
+  whisper_models:
+  whisper_data:
+  whisper_uploads:
+```
+
+## Migration to Linux
+
+The Docker configuration is platform-independent. For Linux:
+
+1. **Install NVIDIA Docker:**
+```bash
+# Ubuntu/Debian
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+
+sudo apt-get update
+sudo apt-get install -y nvidia-docker2
+sudo systemctl restart docker
+```
+
+2. **Clone and start project:**
+```bash
+git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
+cd whisper-api
+docker-compose up -d
+```
+
+3. **Verify GPU passthrough:**
+```bash
+docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
+```
+
+## Available Models
+
+| Model | Size | Description | Speed | Accuracy |
+|-------|------|-------------|-------|----------|
+| **tiny** | 39 MB | Fastest, lowest quality | Very fast | Low |
+| **base** | 74 MB | Good for testing | Fast | Medium |
+| **small** | 244 MB | Balance speed/quality | Medium | Good |
+| **medium** | 769 MB | Good accuracy | Slow | Very good |
+| **large-v2** | 2.87 GB | Higher accuracy | Very slow | Excellent |
+| **large-v3** | 2.88 GB | Best accuracy (Default) | Very slow | Excellent |
+
+**Recommendations:**
+- **Development/Testing:** `base` or `small`
+- **Production:** `large-v3` (with RTX 3090)
+- **CPU Mode:** `small` or `medium`
+
+## Performance
+
+With RTX 3090 and large-v3:
+- **1 minute audio:** ~3-5 seconds processing time
+- **VRAM usage:** ~10 GB
+- **Batch processing:** Possible for parallel requests
+
+With CPU and small:
+- **1 minute audio:** ~30-60 seconds processing time
+- **RAM usage:** ~1 GB
+
+## Integration with Clawdbot
+
+For integration into a Clawdbot skill:
+
+```python
+import requests
+
+API_URL = "http://localhost:8000/v1/audio/transcriptions"
+API_KEY = "sk-your-api-key"
+
+def transcribe_audio(audio_path):
+    with open(audio_path, "rb") as f:
+        response = requests.post(
+            API_URL,
+            headers={"Authorization": f"Bearer {API_KEY}"},
+            files={"file": f},
+            data={"language": "de"}
+        )
+    return response.json()["text"]
+```
+
+## Troubleshooting
+
+### GPU not recognized / Automatic CPU Fallback
+
+If no GPU is detected, the API automatically switches to CPU mode:
+
+```bash
+# Check NVIDIA Container Toolkit
+docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
+
+# Check logs - should show "GPU not available, falling back to CPU mode"
+docker-compose logs whisper-api
+```
+
+**Manual switch:** Via Admin Panel (`/admin/models`) or API:
+```bash
+curl -X POST http://localhost:8000/v1/switch-device \
+  -H "Authorization: Bearer sk-your-api-key" \
+  -F "device=cpu"
+```
+
+### Model Download Status Display
+
+- **Dashboard:** Shows download progress in real-time
+- **API:** `GET /v1/model-status` for current status
+- **Logs:** `docker-compose logs -f` shows download progress
+
+### Slow Model Download
+
+```bash
+# In Admin Panel under Models select a smaller model (e.g. base, small)
+# Or via API:
+curl -X POST http://localhost:8000/v1/switch-model \
+  -H "Authorization: Bearer sk-your-api-key" \
+  -F "model=base"
+```
+
+### Port already in use
+
+```bash
+# Change port in .env
+PORT=8001
+```
+
+## Backup
+
+Important data (Docker Named Volumes):
+- `whisper_data` - SQLite database (API keys, logs)
+- `whisper_models` - Downloaded Whisper models
+- `./.env` - Configuration
+
+```bash
+# Create backup
+docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "tar czf /backup/whisper-api-backup.tar.gz -C / data models"
+
+# Or complete backup including .env
+cp .env .env.backup
+docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine tar czf /backup/whisper-api-full-backup.tar.gz -C / data models
+```
+
+### Restore Backup
+
+```bash
+# Extract backup
+docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "cd / && tar xzf /backup/whisper-api-backup.tar.gz"
+```
+
+## License
+
+MIT License - See LICENSE file
+
+## Support
+
+For issues:
+1. Check logs: `docker-compose logs -f`
+2. Health check: `curl http://localhost:8000/health`
+3. Create issue on Gitea
+
+---
+
+**Created for:** b0rborad @ ragtag.rocks  
+**Hardware:** Dual RTX 3090 Setup  
+**Purpose:** Clawdbot Skill Integration