# Whisper API [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) A local Whisper API with GPU acceleration and web admin interface for audio transcription. OpenAI-compatible API with multi-model support. [πŸ‡©πŸ‡ͺ Deutsche Version](README.md) | **πŸ‡ΊπŸ‡Έ English Version** ## Features - **OpenAI-compatible API** - Drop-in replacement for OpenAI Whisper API - **GPU Accelerated** - Uses NVIDIA GPUs (CUDA) for fast transcription - **CPU Fallback** - Automatic switch to CPU when no GPU is available - **Multi-Model Support** - Supports all Whisper models (tiny to large-v3) - **Model Management** - Download, switch and delete models via Admin Panel - **Default: large-v3** - Best quality with your RTX 3090 - **Web Admin Interface** - API key management, model management and statistics at `/admin` - **API Key Authentication** - Secure access control (Environment + Database) - **Cross-Platform** - Docker-based, runs on Windows and Linux - **Automatic Cleanup** - Logs automatically deleted after 30 days - **Persistent Storage** - Models and data in Docker volumes ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Client/App │────▢│ FastAPI App │────▢│ Whisper GPU β”‚ β”‚ (Clawdbot etc) β”‚ β”‚ (Port 8000) β”‚ β”‚ (large-v3) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ /admin Panel β”‚ β”‚ - Key Mgmt β”‚ β”‚ - Models β”‚ β”‚ - Dashboard β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Quick Start ### Prerequisites - Docker Desktop (Windows) or Docker + docker-compose (Linux) - NVIDIA GPU with CUDA support (RTX 3090) - optional, CPU fallback available - NVIDIA Container Toolkit installed (for GPU support) ### Installation 1. **Clone repository:** ```bash git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git cd whisper-api ``` 2. **Configure environment variables:** ```bash cp .env.example .env # Edit .env to your needs ``` 3. **Start Docker container:** ```bash docker-compose up -d ``` 4. **First start:** - The `large-v3` model (~3GB) will be downloaded automatically - This may take 5-10 minutes - Check status: `docker-compose logs -f` ### Verification ```bash # Health check curl http://localhost:8000/health # API info curl http://localhost:8000/v1/models ``` ## API Documentation ### Authentication All API endpoints (except `/health` and `/admin`) require an API key: ```bash Authorization: Bearer sk-your-api-key-here ``` ### Endpoints #### POST /v1/audio/transcriptions Transcribes an audio file. **Request:** ```bash curl -X POST http://localhost:8000/v1/audio/transcriptions \ -H "Authorization: Bearer sk-your-api-key" \ -H "Content-Type: multipart/form-data" \ -F "file=@/path/to/audio.mp3" \ -F "model=large-v3" \ -F "language=de" \ -F "response_format=json" ``` **Response:** ```json { "text": "Hello World, this is a test." } ``` #### POST /v1/audio/transcriptions (with Timestamps) **Request:** ```bash curl -X POST http://localhost:8000/v1/audio/transcriptions \ -H "Authorization: Bearer sk-your-api-key" \ -F "file=@audio.mp3" \ -F "timestamp_granularities[]=word" \ -F "response_format=verbose_json" ``` **Response:** ```json { "text": "Hello World", "segments": [ { "id": 0, "start": 0.0, "end": 1.5, "text": "Hello World", "words": [ {"word": "Hello", "start": 0.0, "end": 0.5}, {"word": "World", "start": 0.6, "end": 1.2} ] } ] } ``` #### GET /v1/models List available models. #### GET /v1/available-models List all available Whisper models with download status. **Response:** ```json { "models": [ { "name": "large-v3", "size": "2.88 GB", "description": "Best accuracy", "is_downloaded": true, "is_active": true } ] } ``` #### GET /v1/model-status Current download status of the model. **Response:** ```json { "name": "large-v3", "loaded": true, "is_downloading": false, "download_percentage": 100, "status_message": "Model loaded successfully" } ``` #### POST /v1/switch-model Switch to a different model. **Request:** ```bash curl -X POST http://localhost:8000/v1/switch-model \ -H "Authorization: Bearer sk-your-api-key" \ -F "model=base" ``` #### POST /v1/reload-model Re-download current model. #### DELETE /v1/delete-model/{model_name} Delete a downloaded model. #### GET /health Health check with GPU and model status. **Response:** ```json { "status": "healthy", "model": "large-v3", "gpu": { "available": true, "name": "NVIDIA GeForce RTX 3090", "vram_used_gb": 2.1, "vram_total_gb": 24.0 }, "model_status": { "loaded": true, "is_downloading": false, "download_percentage": 100 } } ``` ## Admin Interface The web interface is accessible at: `http://localhost:8000/admin` ### Login - **Username:** `admin` (configurable in `.env`) - **Password:** `-whisper12510-` (configurable in `.env`) ### Features - **Dashboard:** Overview of usage, performance statistics, **Model Download Status** - **API Keys:** Manage (create, deactivate, delete) - **Models:** - Manage all Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3) - Download, activate and delete models - **CPU/GPU Mode Toggle** - Reload model - **Logs:** Detailed transcription logs with filter ## Configuration ### .env.example ```bash # Server PORT=8000 HOST=0.0.0.0 # Whisper WHISPER_MODEL=large-v3 WHISPER_DEVICE=cuda # or 'cpu' for CPU mode WHISPER_COMPUTE_TYPE=float16 # Authentication # Multiple API keys separated by comma API_KEYS=sk-your-first-key,sk-your-second-key ADMIN_USER=admin ADMIN_PASSWORD=-whisper12510- # Data retention (days) LOG_RETENTION_DAYS=30 # Optional: Sentry for error tracking # SENTRY_DSN=https://... ``` ### Docker-Compose Customization ```yaml services: whisper-api: # ... environment: - PORT=8000 # Changeable - WHISPER_MODEL=large-v3 - WHISPER_DEVICE=cuda # or 'cpu' for CPU mode volumes: - whisper_models:/app/models # Persists models (Named Volume) - whisper_data:/app/data # SQLite database - whisper_uploads:/app/uploads # Temporary uploads deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] volumes: whisper_models: whisper_data: whisper_uploads: ``` ## Migration to Linux The Docker configuration is platform-independent. For Linux: 1. **Install NVIDIA Docker:** ```bash # Ubuntu/Debian distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker ``` 2. **Clone and start project:** ```bash git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git cd whisper-api docker-compose up -d ``` 3. **Verify GPU passthrough:** ```bash docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi ``` ## Available Models | Model | Size | Description | Speed | Accuracy | |-------|------|-------------|-------|----------| | **tiny** | 39 MB | Fastest, lowest quality | Very fast | Low | | **base** | 74 MB | Good for testing | Fast | Medium | | **small** | 244 MB | Balance speed/quality | Medium | Good | | **medium** | 769 MB | Good accuracy | Slow | Very good | | **large-v2** | 2.87 GB | Higher accuracy | Very slow | Excellent | | **large-v3** | 2.88 GB | Best accuracy (Default) | Very slow | Excellent | **Recommendations:** - **Development/Testing:** `base` or `small` - **Production:** `large-v3` (with RTX 3090) - **CPU Mode:** `small` or `medium` ## Performance With RTX 3090 and large-v3: - **1 minute audio:** ~3-5 seconds processing time - **VRAM usage:** ~10 GB - **Batch processing:** Possible for parallel requests With CPU and small: - **1 minute audio:** ~30-60 seconds processing time - **RAM usage:** ~1 GB ## Integration with Clawdbot For integration into a Clawdbot skill: ```python import requests API_URL = "http://localhost:8000/v1/audio/transcriptions" API_KEY = "sk-your-api-key" def transcribe_audio(audio_path): with open(audio_path, "rb") as f: response = requests.post( API_URL, headers={"Authorization": f"Bearer {API_KEY}"}, files={"file": f}, data={"language": "de"} ) return response.json()["text"] ``` ## Troubleshooting ### GPU not recognized / Automatic CPU Fallback If no GPU is detected, the API automatically switches to CPU mode: ```bash # Check NVIDIA Container Toolkit docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi # Check logs - should show "GPU not available, falling back to CPU mode" docker-compose logs whisper-api ``` **Manual switch:** Via Admin Panel (`/admin/models`) or API: ```bash curl -X POST http://localhost:8000/v1/switch-device \ -H "Authorization: Bearer sk-your-api-key" \ -F "device=cpu" ``` ### Model Download Status Display - **Dashboard:** Shows download progress in real-time - **API:** `GET /v1/model-status` for current status - **Logs:** `docker-compose logs -f` shows download progress ### Slow Model Download ```bash # In Admin Panel under Models select a smaller model (e.g. base, small) # Or via API: curl -X POST http://localhost:8000/v1/switch-model \ -H "Authorization: Bearer sk-your-api-key" \ -F "model=base" ``` ### Port already in use ```bash # Change port in .env PORT=8001 ``` ## Backup Important data (Docker Named Volumes): - `whisper_data` - SQLite database (API keys, logs) - `whisper_models` - Downloaded Whisper models - `./.env` - Configuration ```bash # Create backup docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "tar czf /backup/whisper-api-backup.tar.gz -C / data models" # Or complete backup including .env cp .env .env.backup docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine tar czf /backup/whisper-api-full-backup.tar.gz -C / data models ``` ### Restore Backup ```bash # Extract backup docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "cd / && tar xzf /backup/whisper-api-backup.tar.gz" ``` ## License MIT License - See LICENSE file ## Support For issues: 1. Check logs: `docker-compose logs -f` 2. Health check: `curl http://localhost:8000/health` 3. Create issue on Gitea --- **Created for:** b0rborad @ ragtag.rocks **Hardware:** Dual RTX 3090 Setup **Purpose:** Clawdbot Skill Integration