whisper-api/README.md

# Whisper API

Eine lokale Whisper-API mit GPU-Beschleunigung und Web-Admin-Interface für die Transkription von Audio-Dateien.

## Features

- **OpenAI-kompatible API** - Drop-in Ersatz für OpenAI Whisper API
- **GPU-beschleunigt** - Nutzt NVIDIA GPUs (CUDA) für schnelle Transkription
- **Default: large-v3** - Beste Qualität mit deiner RTX 3090
- **Web-Admin-Interface** - API-Key Management und Statistiken unter `/admin`
- **API-Key Authentifizierung** - Sichere Zugriffskontrolle
- **Cross-Platform** - Docker-basiert, läuft auf Windows und Linux
- **Automatische Cleanup** - Logs nach 30 Tagen automatisch gelöscht

## Architektur

```
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   Client/App    │────▶│   FastAPI App    │────▶│  Whisper GPU    │
│  (Clawdbot etc) │     │   (Port 8000)    │     │  (large-v3)     │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                               │
                               ▼
                        ┌──────────────────┐
                        │  /admin Panel    │
                        │  - Key Mgmt      │
                        │  - Dashboard     │
                        │  - Logs          │
                        └──────────────────┘
```

## Schnellstart

### Voraussetzungen

- Docker Desktop (Windows) oder Docker + docker-compose (Linux)
- NVIDIA GPU mit CUDA-Unterstützung (RTX 3090)
- NVIDIA Container Toolkit installiert

### Installation

1. **Repository klonen:**
```bash
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
cd whisper-api
```

2. **Umgebungsvariablen konfigurieren:**
```bash
cp .env.example .env
# Bearbeite .env nach deinen Wünschen
```

3. **Docker-Container starten:**
```bash
docker-compose up -d
```

4. **Erster Start:**
   - Das `large-v3` Modell (~3GB) wird automatisch heruntergeladen
   - Dies kann 5-10 Minuten dauern
   - Status überprüfen: `docker-compose logs -f`

### Verifizierung

```bash
# Health-Check
curl http://localhost:8000/health

# API-Info
curl http://localhost:8000/v1/models
```

## API-Dokumentation

### Authentifizierung

Alle API-Endpunkte (außer `/health` und `/admin`) benötigen einen API-Key:

```bash
Authorization: Bearer sk-dein-api-key-hier
```

### Endpunkte

#### POST /v1/audio/transcriptions

Transkribiert eine Audio-Datei.

**Request:**
```bash
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-dein-api-key" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/pfad/zur/audio.mp3" \
  -F "model=large-v3" \
  -F "language=de" \
  -F "response_format=json"
```

**Response:**
```json
{
  "text": "Hallo Welt, das ist ein Test."
}
```

#### POST /v1/audio/transcriptions (mit Timestamps)

**Request:**
```bash
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-dein-api-key" \
  -F "file=@audio.mp3" \
  -F "timestamp_granularities[]=word" \
  -F "response_format=verbose_json"
```

**Response:**
```json
{
  "text": "Hallo Welt",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 1.5,
      "text": "Hallo Welt",
      "words": [
        {"word": "Hallo", "start": 0.0, "end": 0.5},
        {"word": "Welt", "start": 0.6, "end": 1.2}
      ]
    }
  ]
}
```

#### GET /v1/models

Liste verfügbarer Modelle.

#### GET /health

Health-Check mit GPU-Status.

**Response:**
```json
{
  "status": "healthy",
  "gpu": {
    "available": true,
    "name": "NVIDIA GeForce RTX 3090",
    "vram_used": "2.1 GB",
    "vram_total": "24.0 GB"
  },
  "model": "large-v3",
  "version": "1.0.0"
}
```

## Admin-Interface

Das Web-Interface ist erreichbar unter: `http://localhost:8000/admin`

### Login

- **Benutzername:** `admin` (konfigurierbar in `.env`)
- **Passwort:** `-whisper12510-` (konfigurierbar in `.env`)

### Features

- **Dashboard:** Übersicht über Nutzung, Performance-Statistiken
- **API-Keys:** Verwalten (erstellen, deaktivieren, löschen)
- **Logs:** Detaillierte Transkriptions-Logs mit Filter

## Konfiguration

### .env.example

```bash
# Server
PORT=8000
HOST=0.0.0.0

# Whisper
WHISPER_MODEL=large-v3
WHISPER_DEVICE=cuda
WHISPER_COMPUTE_TYPE=float16

# Authentifizierung
# Mehrere API-Keys mit Komma trennen
API_KEYS=sk-dein-erster-key,sk-dein-zweiter-key
ADMIN_USER=admin
ADMIN_PASSWORD=-whisper12510-

# Daten-Retention (Tage)
LOG_RETENTION_DAYS=30

# Optional: Sentry für Error-Tracking
# SENTRY_DSN=https://...
```

### Docker-Compose Anpassungen

```yaml
services:
  whisper-api:
    # ...
    environment:
      - PORT=8000  # Änderbar
      - WHISPER_MODEL=large-v3
    volumes:
      - ./models:/app/models    # Persistiert Modelle
      - ./data:/app/data        # SQLite Datenbank
      - ./uploads:/app/uploads  # Temporäre Uploads
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

## Migration zu Linux

Die Docker-Konfiguration ist plattformunabhängig. Für Linux:

1. **NVIDIA Docker installieren:**
```bash
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
```

2. **Projekt klonen und starten:**
```bash
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
cd whisper-api
docker-compose up -d
```

3. **GPU-Passthrough verifizieren:**
```bash
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
```

## Integration mit Clawdbot

Für die Integration in einen Clawdbot Skill:

```python
import requests

API_URL = "http://localhost:8000/v1/audio/transcriptions"
API_KEY = "sk-dein-api-key"

def transcribe_audio(audio_path):
    with open(audio_path, "rb") as f:
        response = requests.post(
            API_URL,
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"file": f},
            data={"language": "de"}
        )
    return response.json()["text"]
```

## Performance

Mit RTX 3090 und large-v3:
- **1 Minute Audio:** ~3-5 Sekunden Verarbeitungszeit
- **VRAM-Nutzung:** ~10 GB
- **Batch-Verarbeitung:** Möglich für parallele Requests

## Troubleshooting

### GPU nicht erkannt

```bash
# NVIDIA Container Toolkit prüfen
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi

# Logs prüfen
docker-compose logs whisper-api
```

### Modell-Download langsam

```bash
# Manuelles Downloaden möglich
mkdir -p models
# Modelle werden von HuggingFace heruntergeladen
```

### Port belegt

```bash
# Port in .env ändern
PORT=8001
```

## Backup

Wichtige Daten:
- `./data/` - SQLite Datenbank (API-Keys, Logs)
- `./models/` - Heruntergeladene Whisper-Modelle
- `./.env` - Konfiguration

```bash
# Backup erstellen
tar -czvf whisper-api-backup.tar.gz data/ models/ .env
```

## Lizenz

MIT License - Siehe LICENSE Datei

## Support

Bei Problemen:
1. Logs prüfen: `docker-compose logs -f`
2. Health-Check: `curl http://localhost:8000/health`
3. Issue auf Gitea erstellen

---

**Erstellt für:** b0rborad @ ragtag.rocks
**Hardware:** Dual RTX 3090 Setup
**Zweck:** Clawdbot Skill Integration