Whisper API
A local Whisper API with GPU acceleration and web admin interface for audio transcription. OpenAI-compatible API with multi-model support.
🇩🇪 Deutsche Version | 🇺🇸 English Version
Features
- OpenAI-compatible API - Drop-in replacement for OpenAI Whisper API
- GPU Accelerated - Uses NVIDIA GPUs (CUDA) for fast transcription
- CPU Fallback - Automatic switch to CPU when no GPU is available
- Multi-Model Support - Supports all Whisper models (tiny to large-v3)
- Model Management - Download, switch and delete models via Admin Panel
- Default: large-v3 - Best quality with your RTX 3090
- Web Admin Interface - API key management, model management and statistics at
/admin - API Key Authentication - Secure access control (Environment + Database)
- Cross-Platform - Docker-based, runs on Windows and Linux
- Automatic Cleanup - Logs automatically deleted after 30 days
- Persistent Storage - Models and data in Docker volumes
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Client/App │────▶│ FastAPI App │────▶│ Whisper GPU │
│ (Clawdbot etc) │ │ (Port 8000) │ │ (large-v3) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ /admin Panel │
│ - Key Mgmt │
│ - Models │
│ - Dashboard │
└──────────────────┘
Quick Start
Prerequisites
- Docker Desktop (Windows) or Docker + docker-compose (Linux)
- NVIDIA GPU with CUDA support (RTX 3090) - optional, CPU fallback available
- NVIDIA Container Toolkit installed (for GPU support)
Installation
- Clone repository:
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
cd whisper-api
- Configure environment variables:
cp .env.example .env
# Edit .env to your needs
- Start Docker container:
docker-compose up -d
- First start:
- The
large-v3model (~3GB) will be downloaded automatically - This may take 5-10 minutes
- Check status:
docker-compose logs -f
- The
Verification
# Health check
curl http://localhost:8000/health
# API info
curl http://localhost:8000/v1/models
API Documentation
Authentication
All API endpoints (except /health and /admin) require an API key:
Authorization: Bearer sk-your-api-key-here
Endpoints
POST /v1/audio/transcriptions
Transcribes an audio file.
Request:
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/audio.mp3" \
-F "model=large-v3" \
-F "language=de" \
-F "response_format=json"
Response:
{
"text": "Hello World, this is a test."
}
POST /v1/audio/transcriptions (with Timestamps)
Request:
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Authorization: Bearer sk-your-api-key" \
-F "file=@audio.mp3" \
-F "timestamp_granularities[]=word" \
-F "response_format=verbose_json"
Response:
{
"text": "Hello World",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 1.5,
"text": "Hello World",
"words": [
{"word": "Hello", "start": 0.0, "end": 0.5},
{"word": "World", "start": 0.6, "end": 1.2}
]
}
]
}
GET /v1/models
List available models.
GET /v1/available-models
List all available Whisper models with download status.
Response:
{
"models": [
{
"name": "large-v3",
"size": "2.88 GB",
"description": "Best accuracy",
"is_downloaded": true,
"is_active": true
}
]
}
GET /v1/model-status
Current download status of the model.
Response:
{
"name": "large-v3",
"loaded": true,
"is_downloading": false,
"download_percentage": 100,
"status_message": "Model loaded successfully"
}
POST /v1/switch-model
Switch to a different model.
Request:
curl -X POST http://localhost:8000/v1/switch-model \
-H "Authorization: Bearer sk-your-api-key" \
-F "model=base"
POST /v1/reload-model
Re-download current model.
DELETE /v1/delete-model/{model_name}
Delete a downloaded model.
GET /health
Health check with GPU and model status.
Response:
{
"status": "healthy",
"model": "large-v3",
"gpu": {
"available": true,
"name": "NVIDIA GeForce RTX 3090",
"vram_used_gb": 2.1,
"vram_total_gb": 24.0
},
"model_status": {
"loaded": true,
"is_downloading": false,
"download_percentage": 100
}
}
Admin Interface
The web interface is accessible at: http://localhost:8000/admin
Login
- Username:
admin(configurable in.env) - Password:
-whisper12510-(configurable in.env)
Features
- Dashboard: Overview of usage, performance statistics, Model Download Status
- API Keys: Manage (create, deactivate, delete)
- Models:
- Manage all Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3)
- Download, activate and delete models
- CPU/GPU Mode Toggle
- Reload model
- Logs: Detailed transcription logs with filter
Configuration
.env.example
# Server
PORT=8000
HOST=0.0.0.0
# Whisper
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda # or 'cpu' for CPU mode
WHISPER_COMPUTE_TYPE=float16
# Authentication
# Multiple API keys separated by comma
API_KEYS=sk-your-first-key,sk-your-second-key
ADMIN_USER=admin
ADMIN_PASSWORD=-whisper12510-
# Data retention (days)
LOG_RETENTION_DAYS=30
# Optional: Sentry for error tracking
# SENTRY_DSN=https://...
Docker-Compose Customization
services:
whisper-api:
# ...
environment:
- PORT=8000 # Changeable
- WHISPER_MODEL=large-v3
- WHISPER_DEVICE=cuda # or 'cpu' for CPU mode
volumes:
- whisper_models:/app/models # Persists models (Named Volume)
- whisper_data:/app/data # SQLite database
- whisper_uploads:/app/uploads # Temporary uploads
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
whisper_models:
whisper_data:
whisper_uploads:
Migration to Linux
The Docker configuration is platform-independent. For Linux:
- Install NVIDIA Docker:
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
- Clone and start project:
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
cd whisper-api
docker-compose up -d
- Verify GPU passthrough:
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
Available Models
| Model | Size | Description | Speed | Accuracy |
|---|---|---|---|---|
| tiny | 39 MB | Fastest, lowest quality | Very fast | Low |
| base | 74 MB | Good for testing | Fast | Medium |
| small | 244 MB | Balance speed/quality | Medium | Good |
| medium | 769 MB | Good accuracy | Slow | Very good |
| large-v2 | 2.87 GB | Higher accuracy | Very slow | Excellent |
| large-v3 | 2.88 GB | Best accuracy (Default) | Very slow | Excellent |
Recommendations:
- Development/Testing:
baseorsmall - Production:
large-v3(with RTX 3090) - CPU Mode:
smallormedium
Performance
With RTX 3090 and large-v3:
- 1 minute audio: ~3-5 seconds processing time
- VRAM usage: ~10 GB
- Batch processing: Possible for parallel requests
With CPU and small:
- 1 minute audio: ~30-60 seconds processing time
- RAM usage: ~1 GB
Integration with Clawdbot
For integration into a Clawdbot skill:
import requests
API_URL = "http://localhost:8000/v1/audio/transcriptions"
API_KEY = "sk-your-api-key"
def transcribe_audio(audio_path):
with open(audio_path, "rb") as f:
response = requests.post(
API_URL,
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"language": "de"}
)
return response.json()["text"]
Troubleshooting
GPU not recognized / Automatic CPU Fallback
If no GPU is detected, the API automatically switches to CPU mode:
# Check NVIDIA Container Toolkit
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
# Check logs - should show "GPU not available, falling back to CPU mode"
docker-compose logs whisper-api
Manual switch: Via Admin Panel (/admin/models) or API:
curl -X POST http://localhost:8000/v1/switch-device \
-H "Authorization: Bearer sk-your-api-key" \
-F "device=cpu"
Model Download Status Display
- Dashboard: Shows download progress in real-time
- API:
GET /v1/model-statusfor current status - Logs:
docker-compose logs -fshows download progress
Slow Model Download
# In Admin Panel under Models select a smaller model (e.g. base, small)
# Or via API:
curl -X POST http://localhost:8000/v1/switch-model \
-H "Authorization: Bearer sk-your-api-key" \
-F "model=base"
Port already in use
# Change port in .env
PORT=8001
Backup
Important data (Docker Named Volumes):
whisper_data- SQLite database (API keys, logs)whisper_models- Downloaded Whisper models./.env- Configuration
# Create backup
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "tar czf /backup/whisper-api-backup.tar.gz -C / data models"
# Or complete backup including .env
cp .env .env.backup
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine tar czf /backup/whisper-api-full-backup.tar.gz -C / data models
Restore Backup
# Extract backup
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "cd / && tar xzf /backup/whisper-api-backup.tar.gz"
License
MIT License - See LICENSE file
Support
For issues:
- Check logs:
docker-compose logs -f - Health check:
curl http://localhost:8000/health - Create issue on Gitea
Created for: b0rborad @ ragtag.rocks
Hardware: Dual RTX 3090 Setup
Purpose: Clawdbot Skill Integration