From 4cd777d07fd6e7b6f3edd4a7505766f9a8aae69b Mon Sep 17 00:00:00 2001 From: Dominic Ballenthin Date: Thu, 29 Jan 2026 02:15:54 +0100 Subject: [PATCH] Add English README and cross-link both documentation versions --- README.md | 6 +- README_EN.md | 459 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 464 insertions(+), 1 deletion(-) create mode 100644 README_EN.md diff --git a/README.md b/README.md index a0162be..5847d36 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,10 @@ # Whisper API -Eine lokale Whisper-API mit GPU-Beschleunigung und Web-Admin-Interface fΓΌr die Transkription von Audio-Dateien. +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) + +Eine lokale Whisper-API mit GPU-Beschleunigung und Web-Admin-Interface fΓΌr die Transkription von Audio-Dateien. OpenAI-kompatible API mit Multi-Model-Support. + +**πŸ‡©πŸ‡ͺ Deutsche Version** | [πŸ‡ΊπŸ‡Έ English Version](README_EN.md) ## Features diff --git a/README_EN.md b/README_EN.md new file mode 100644 index 0000000..016d008 --- /dev/null +++ b/README_EN.md @@ -0,0 +1,459 @@ +# Whisper API + +[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) + +A local Whisper API with GPU acceleration and web admin interface for audio transcription. OpenAI-compatible API with multi-model support. + +[πŸ‡©πŸ‡ͺ Deutsche Version](README.md) | **πŸ‡ΊπŸ‡Έ English Version** + +## Features + +- **OpenAI-compatible API** - Drop-in replacement for OpenAI Whisper API +- **GPU Accelerated** - Uses NVIDIA GPUs (CUDA) for fast transcription +- **CPU Fallback** - Automatic switch to CPU when no GPU is available +- **Multi-Model Support** - Supports all Whisper models (tiny to large-v3) +- **Model Management** - Download, switch and delete models via Admin Panel +- **Default: large-v3** - Best quality with your RTX 3090 +- **Web Admin Interface** - API key management, model management and statistics at `/admin` +- **API Key Authentication** - Secure access control (Environment + Database) +- **Cross-Platform** - Docker-based, runs on Windows and Linux +- **Automatic Cleanup** - Logs automatically deleted after 30 days +- **Persistent Storage** - Models and data in Docker volumes + +## Architecture + +``` +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ Client/App │────▢│ FastAPI App │────▢│ Whisper GPU β”‚ +β”‚ (Clawdbot etc) β”‚ β”‚ (Port 8000) β”‚ β”‚ (large-v3) β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + β–Ό + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ /admin Panel β”‚ + β”‚ - Key Mgmt β”‚ + β”‚ - Models β”‚ + β”‚ - Dashboard β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + +## Quick Start + +### Prerequisites + +- Docker Desktop (Windows) or Docker + docker-compose (Linux) +- NVIDIA GPU with CUDA support (RTX 3090) - optional, CPU fallback available +- NVIDIA Container Toolkit installed (for GPU support) + +### Installation + +1. **Clone repository:** +```bash +git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git +cd whisper-api +``` + +2. **Configure environment variables:** +```bash +cp .env.example .env +# Edit .env to your needs +``` + +3. **Start Docker container:** +```bash +docker-compose up -d +``` + +4. **First start:** + - The `large-v3` model (~3GB) will be downloaded automatically + - This may take 5-10 minutes + - Check status: `docker-compose logs -f` + +### Verification + +```bash +# Health check +curl http://localhost:8000/health + +# API info +curl http://localhost:8000/v1/models +``` + +## API Documentation + +### Authentication + +All API endpoints (except `/health` and `/admin`) require an API key: + +```bash +Authorization: Bearer sk-your-api-key-here +``` + +### Endpoints + +#### POST /v1/audio/transcriptions + +Transcribes an audio file. + +**Request:** +```bash +curl -X POST http://localhost:8000/v1/audio/transcriptions \ + -H "Authorization: Bearer sk-your-api-key" \ + -H "Content-Type: multipart/form-data" \ + -F "file=@/path/to/audio.mp3" \ + -F "model=large-v3" \ + -F "language=de" \ + -F "response_format=json" +``` + +**Response:** +```json +{ + "text": "Hello World, this is a test." +} +``` + +#### POST /v1/audio/transcriptions (with Timestamps) + +**Request:** +```bash +curl -X POST http://localhost:8000/v1/audio/transcriptions \ + -H "Authorization: Bearer sk-your-api-key" \ + -F "file=@audio.mp3" \ + -F "timestamp_granularities[]=word" \ + -F "response_format=verbose_json" +``` + +**Response:** +```json +{ + "text": "Hello World", + "segments": [ + { + "id": 0, + "start": 0.0, + "end": 1.5, + "text": "Hello World", + "words": [ + {"word": "Hello", "start": 0.0, "end": 0.5}, + {"word": "World", "start": 0.6, "end": 1.2} + ] + } + ] +} +``` + +#### GET /v1/models + +List available models. + +#### GET /v1/available-models + +List all available Whisper models with download status. + +**Response:** +```json +{ + "models": [ + { + "name": "large-v3", + "size": "2.88 GB", + "description": "Best accuracy", + "is_downloaded": true, + "is_active": true + } + ] +} +``` + +#### GET /v1/model-status + +Current download status of the model. + +**Response:** +```json +{ + "name": "large-v3", + "loaded": true, + "is_downloading": false, + "download_percentage": 100, + "status_message": "Model loaded successfully" +} +``` + +#### POST /v1/switch-model + +Switch to a different model. + +**Request:** +```bash +curl -X POST http://localhost:8000/v1/switch-model \ + -H "Authorization: Bearer sk-your-api-key" \ + -F "model=base" +``` + +#### POST /v1/reload-model + +Re-download current model. + +#### DELETE /v1/delete-model/{model_name} + +Delete a downloaded model. + +#### GET /health + +Health check with GPU and model status. + +**Response:** +```json +{ + "status": "healthy", + "model": "large-v3", + "gpu": { + "available": true, + "name": "NVIDIA GeForce RTX 3090", + "vram_used_gb": 2.1, + "vram_total_gb": 24.0 + }, + "model_status": { + "loaded": true, + "is_downloading": false, + "download_percentage": 100 + } +} +``` + +## Admin Interface + +The web interface is accessible at: `http://localhost:8000/admin` + +### Login + +- **Username:** `admin` (configurable in `.env`) +- **Password:** `-whisper12510-` (configurable in `.env`) + +### Features + +- **Dashboard:** Overview of usage, performance statistics, **Model Download Status** +- **API Keys:** Manage (create, deactivate, delete) +- **Models:** + - Manage all Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3) + - Download, activate and delete models + - **CPU/GPU Mode Toggle** + - Reload model +- **Logs:** Detailed transcription logs with filter + +## Configuration + +### .env.example + +```bash +# Server +PORT=8000 +HOST=0.0.0.0 + +# Whisper +WHISPER_MODEL=large-v3 +WHISPER_DEVICE=cuda # or 'cpu' for CPU mode +WHISPER_COMPUTE_TYPE=float16 + +# Authentication +# Multiple API keys separated by comma +API_KEYS=sk-your-first-key,sk-your-second-key +ADMIN_USER=admin +ADMIN_PASSWORD=-whisper12510- + +# Data retention (days) +LOG_RETENTION_DAYS=30 + +# Optional: Sentry for error tracking +# SENTRY_DSN=https://... +``` + +### Docker-Compose Customization + +```yaml +services: + whisper-api: + # ... + environment: + - PORT=8000 # Changeable + - WHISPER_MODEL=large-v3 + - WHISPER_DEVICE=cuda # or 'cpu' for CPU mode + volumes: + - whisper_models:/app/models # Persists models (Named Volume) + - whisper_data:/app/data # SQLite database + - whisper_uploads:/app/uploads # Temporary uploads + deploy: + resources: + reservations: + devices: + - driver: nvidia + count: all + capabilities: [gpu] + +volumes: + whisper_models: + whisper_data: + whisper_uploads: +``` + +## Migration to Linux + +The Docker configuration is platform-independent. For Linux: + +1. **Install NVIDIA Docker:** +```bash +# Ubuntu/Debian +distribution=$(. /etc/os-release;echo $ID$VERSION_ID) +curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - +curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list + +sudo apt-get update +sudo apt-get install -y nvidia-docker2 +sudo systemctl restart docker +``` + +2. **Clone and start project:** +```bash +git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git +cd whisper-api +docker-compose up -d +``` + +3. **Verify GPU passthrough:** +```bash +docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi +``` + +## Available Models + +| Model | Size | Description | Speed | Accuracy | +|-------|------|-------------|-------|----------| +| **tiny** | 39 MB | Fastest, lowest quality | Very fast | Low | +| **base** | 74 MB | Good for testing | Fast | Medium | +| **small** | 244 MB | Balance speed/quality | Medium | Good | +| **medium** | 769 MB | Good accuracy | Slow | Very good | +| **large-v2** | 2.87 GB | Higher accuracy | Very slow | Excellent | +| **large-v3** | 2.88 GB | Best accuracy (Default) | Very slow | Excellent | + +**Recommendations:** +- **Development/Testing:** `base` or `small` +- **Production:** `large-v3` (with RTX 3090) +- **CPU Mode:** `small` or `medium` + +## Performance + +With RTX 3090 and large-v3: +- **1 minute audio:** ~3-5 seconds processing time +- **VRAM usage:** ~10 GB +- **Batch processing:** Possible for parallel requests + +With CPU and small: +- **1 minute audio:** ~30-60 seconds processing time +- **RAM usage:** ~1 GB + +## Integration with Clawdbot + +For integration into a Clawdbot skill: + +```python +import requests + +API_URL = "http://localhost:8000/v1/audio/transcriptions" +API_KEY = "sk-your-api-key" + +def transcribe_audio(audio_path): + with open(audio_path, "rb") as f: + response = requests.post( + API_URL, + headers={"Authorization": f"Bearer {API_KEY}"}, + files={"file": f}, + data={"language": "de"} + ) + return response.json()["text"] +``` + +## Troubleshooting + +### GPU not recognized / Automatic CPU Fallback + +If no GPU is detected, the API automatically switches to CPU mode: + +```bash +# Check NVIDIA Container Toolkit +docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi + +# Check logs - should show "GPU not available, falling back to CPU mode" +docker-compose logs whisper-api +``` + +**Manual switch:** Via Admin Panel (`/admin/models`) or API: +```bash +curl -X POST http://localhost:8000/v1/switch-device \ + -H "Authorization: Bearer sk-your-api-key" \ + -F "device=cpu" +``` + +### Model Download Status Display + +- **Dashboard:** Shows download progress in real-time +- **API:** `GET /v1/model-status` for current status +- **Logs:** `docker-compose logs -f` shows download progress + +### Slow Model Download + +```bash +# In Admin Panel under Models select a smaller model (e.g. base, small) +# Or via API: +curl -X POST http://localhost:8000/v1/switch-model \ + -H "Authorization: Bearer sk-your-api-key" \ + -F "model=base" +``` + +### Port already in use + +```bash +# Change port in .env +PORT=8001 +``` + +## Backup + +Important data (Docker Named Volumes): +- `whisper_data` - SQLite database (API keys, logs) +- `whisper_models` - Downloaded Whisper models +- `./.env` - Configuration + +```bash +# Create backup +docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "tar czf /backup/whisper-api-backup.tar.gz -C / data models" + +# Or complete backup including .env +cp .env .env.backup +docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine tar czf /backup/whisper-api-full-backup.tar.gz -C / data models +``` + +### Restore Backup + +```bash +# Extract backup +docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "cd / && tar xzf /backup/whisper-api-backup.tar.gz" +``` + +## License + +MIT License - See LICENSE file + +## Support + +For issues: +1. Check logs: `docker-compose logs -f` +2. Health check: `curl http://localhost:8000/health` +3. Create issue on Gitea + +--- + +**Created for:** b0rborad @ ragtag.rocks +**Hardware:** Dual RTX 3090 Setup +**Purpose:** Clawdbot Skill Integration