Files

Dominic Ballenthin 4cd777d07f Add English README and cross-link both documentation versions

2026-01-29 02:15:54 +01:00

11 KiB

Raw Blame History

Whisper API

A local Whisper API with GPU acceleration and web admin interface for audio transcription. OpenAI-compatible API with multi-model support.

🇩🇪 Deutsche Version | 🇺🇸 English Version

Features

OpenAI-compatible API - Drop-in replacement for OpenAI Whisper API
GPU Accelerated - Uses NVIDIA GPUs (CUDA) for fast transcription
CPU Fallback - Automatic switch to CPU when no GPU is available
Multi-Model Support - Supports all Whisper models (tiny to large-v3)
Model Management - Download, switch and delete models via Admin Panel
Default: large-v3 - Best quality with your RTX 3090
Web Admin Interface - API key management, model management and statistics at /admin
API Key Authentication - Secure access control (Environment + Database)
Cross-Platform - Docker-based, runs on Windows and Linux
Automatic Cleanup - Logs automatically deleted after 30 days
Persistent Storage - Models and data in Docker volumes

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Client/App    │────▶│   FastAPI App    │────▶│  Whisper GPU    │
│  (Clawdbot etc) │     │   (Port 8000)    │     │  (large-v3)     │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                               │
                               ▼
                         ┌──────────────────┐
                         │  /admin Panel    │
                         │  - Key Mgmt      │
                         │  - Models        │
                         │  - Dashboard     │
                         └──────────────────┘

Quick Start

Prerequisites

Docker Desktop (Windows) or Docker + docker-compose (Linux)
NVIDIA GPU with CUDA support (RTX 3090) - optional, CPU fallback available
NVIDIA Container Toolkit installed (for GPU support)

Installation

Clone repository:

git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
cd whisper-api

Configure environment variables:

cp .env.example .env
# Edit .env to your needs

Start Docker container:

docker-compose up -d

First start:
- The large-v3 model (~3GB) will be downloaded automatically
- This may take 5-10 minutes
- Check status: docker-compose logs -f

Verification

# Health check
curl http://localhost:8000/health

# API info
curl http://localhost:8000/v1/models

API Documentation

Authentication

All API endpoints (except /health and /admin) require an API key:

Authorization: Bearer sk-your-api-key-here

Endpoints

POST /v1/audio/transcriptions

Transcribes an audio file.

Request:

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/audio.mp3" \
  -F "model=large-v3" \
  -F "language=de" \
  -F "response_format=json"

Response:

{
  "text": "Hello World, this is a test."
}

POST /v1/audio/transcriptions (with Timestamps)

Request:

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-your-api-key" \
  -F "file=@audio.mp3" \
  -F "timestamp_granularities[]=word" \
  -F "response_format=verbose_json"

Response:

{
  "text": "Hello World",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 1.5,
      "text": "Hello World",
      "words": [
        {"word": "Hello", "start": 0.0, "end": 0.5},
        {"word": "World", "start": 0.6, "end": 1.2}
      ]
    }
  ]
}

GET /v1/models

List available models.

GET /v1/available-models

List all available Whisper models with download status.

Response:

{
  "models": [
    {
      "name": "large-v3",
      "size": "2.88 GB",
      "description": "Best accuracy",
      "is_downloaded": true,
      "is_active": true
    }
  ]
}

GET /v1/model-status

Current download status of the model.

Response:

{
  "name": "large-v3",
  "loaded": true,
  "is_downloading": false,
  "download_percentage": 100,
  "status_message": "Model loaded successfully"
}

POST /v1/switch-model

Switch to a different model.

Request:

curl -X POST http://localhost:8000/v1/switch-model \
  -H "Authorization: Bearer sk-your-api-key" \
  -F "model=base"

POST /v1/reload-model

Re-download current model.

DELETE /v1/delete-model/{model_name}

Delete a downloaded model.

GET /health

Health check with GPU and model status.

Response:

{
  "status": "healthy",
  "model": "large-v3",
  "gpu": {
    "available": true,
    "name": "NVIDIA GeForce RTX 3090",
    "vram_used_gb": 2.1,
    "vram_total_gb": 24.0
  },
  "model_status": {
    "loaded": true,
    "is_downloading": false,
    "download_percentage": 100
  }
}

Admin Interface

The web interface is accessible at: http://localhost:8000/admin

Username: admin (configurable in .env)
Password: -whisper12510- (configurable in .env)

Features

Dashboard: Overview of usage, performance statistics, Model Download Status
API Keys: Manage (create, deactivate, delete)
Models:
- Manage all Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3)
- Download, activate and delete models
- CPU/GPU Mode Toggle
- Reload model
Logs: Detailed transcription logs with filter

Configuration

.env.example

# Server
PORT=8000
HOST=0.0.0.0

# Whisper
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda  # or 'cpu' for CPU mode
WHISPER_COMPUTE_TYPE=float16

# Authentication
# Multiple API keys separated by comma
API_KEYS=sk-your-first-key,sk-your-second-key
ADMIN_USER=admin
ADMIN_PASSWORD=-whisper12510-

# Data retention (days)
LOG_RETENTION_DAYS=30

# Optional: Sentry for error tracking
# SENTRY_DSN=https://...

Docker-Compose Customization

services:
  whisper-api:
    # ...
    environment:
      - PORT=8000  # Changeable
      - WHISPER_MODEL=large-v3
      - WHISPER_DEVICE=cuda  # or 'cpu' for CPU mode
    volumes:
      - whisper_models:/app/models    # Persists models (Named Volume)
      - whisper_data:/app/data        # SQLite database
      - whisper_uploads:/app/uploads  # Temporary uploads
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

volumes:
  whisper_models:
  whisper_data:
  whisper_uploads:

Migration to Linux

The Docker configuration is platform-independent. For Linux:

Install NVIDIA Docker:

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

Clone and start project:

git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
cd whisper-api
docker-compose up -d

Verify GPU passthrough:

docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

Available Models

Model	Size	Description	Speed	Accuracy
tiny	39 MB	Fastest, lowest quality	Very fast	Low
base	74 MB	Good for testing	Fast	Medium
small	244 MB	Balance speed/quality	Medium	Good
medium	769 MB	Good accuracy	Slow	Very good
large-v2	2.87 GB	Higher accuracy	Very slow	Excellent
large-v3	2.88 GB	Best accuracy (Default)	Very slow	Excellent

Recommendations:

Development/Testing: base or small
Production: large-v3 (with RTX 3090)
CPU Mode: small or medium

Performance

With RTX 3090 and large-v3:

1 minute audio: ~3-5 seconds processing time
VRAM usage: ~10 GB
Batch processing: Possible for parallel requests

With CPU and small:

1 minute audio: ~30-60 seconds processing time
RAM usage: ~1 GB

Integration with Clawdbot

For integration into a Clawdbot skill:

import requests

API_URL = "http://localhost:8000/v1/audio/transcriptions"
API_KEY = "sk-your-api-key"

def transcribe_audio(audio_path):
    with open(audio_path, "rb") as f:
        response = requests.post(
            API_URL,
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"language": "de"}
        )
    return response.json()["text"]

Troubleshooting

GPU not recognized / Automatic CPU Fallback

If no GPU is detected, the API automatically switches to CPU mode:

# Check NVIDIA Container Toolkit
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# Check logs - should show "GPU not available, falling back to CPU mode"
docker-compose logs whisper-api

Manual switch: Via Admin Panel (/admin/models) or API:

curl -X POST http://localhost:8000/v1/switch-device \
  -H "Authorization: Bearer sk-your-api-key" \
  -F "device=cpu"

Model Download Status Display

Dashboard: Shows download progress in real-time
API: GET /v1/model-status for current status
Logs: docker-compose logs -f shows download progress

Slow Model Download

# In Admin Panel under Models select a smaller model (e.g. base, small)
# Or via API:
curl -X POST http://localhost:8000/v1/switch-model \
  -H "Authorization: Bearer sk-your-api-key" \
  -F "model=base"

Port already in use

# Change port in .env
PORT=8001

Backup

Important data (Docker Named Volumes):

whisper_data - SQLite database (API keys, logs)
whisper_models - Downloaded Whisper models
./.env - Configuration

# Create backup
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "tar czf /backup/whisper-api-backup.tar.gz -C / data models"

# Or complete backup including .env
cp .env .env.backup
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine tar czf /backup/whisper-api-full-backup.tar.gz -C / data models

Restore Backup

# Extract backup
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "cd / && tar xzf /backup/whisper-api-backup.tar.gz"

License

MIT License - See LICENSE file

Support

For issues:

Check logs: docker-compose logs -f
Health check: curl http://localhost:8000/health
Create issue on Gitea

Created for: b0rborad @ ragtag.rocks
Hardware: Dual RTX 3090 Setup
Purpose: Clawdbot Skill Integration

11 KiB Raw Blame History

Whisper API

Features

Architecture

Quick Start

Prerequisites

Installation

Verification

API Documentation

Authentication

Endpoints

POST /v1/audio/transcriptions

POST /v1/audio/transcriptions (with Timestamps)

GET /v1/models

GET /v1/available-models

GET /v1/model-status

POST /v1/switch-model

POST /v1/reload-model

DELETE /v1/delete-model/{model_name}

GET /health

Admin Interface

Login

Features

Configuration

.env.example

Docker-Compose Customization

Migration to Linux

Available Models

Performance

Integration with Clawdbot

Troubleshooting

GPU not recognized / Automatic CPU Fallback

Model Download Status Display

Slow Model Download

Port already in use

Backup

Restore Backup

License

Support

11 KiB

Raw Blame History