Initial commit: Whisper API with FastAPI, GPU support and Admin Dashboard
This commit is contained in:
335
README.md
Normal file
335
README.md
Normal file
@@ -0,0 +1,335 @@
|
||||
# Whisper API
|
||||
|
||||
Eine lokale Whisper-API mit GPU-Beschleunigung und Web-Admin-Interface für die Transkription von Audio-Dateien.
|
||||
|
||||
## Features
|
||||
|
||||
- **OpenAI-kompatible API** - Drop-in Ersatz für OpenAI Whisper API
|
||||
- **GPU-beschleunigt** - Nutzt NVIDIA GPUs (CUDA) für schnelle Transkription
|
||||
- **Default: large-v3** - Beste Qualität mit deiner RTX 3090
|
||||
- **Web-Admin-Interface** - API-Key Management und Statistiken unter `/admin`
|
||||
- **API-Key Authentifizierung** - Sichere Zugriffskontrolle
|
||||
- **Cross-Platform** - Docker-basiert, läuft auf Windows und Linux
|
||||
- **Automatische Cleanup** - Logs nach 30 Tagen automatisch gelöscht
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||
│ Client/App │────▶│ FastAPI App │────▶│ Whisper GPU │
|
||||
│ (Clawdbot etc) │ │ (Port 8000) │ │ (large-v3) │
|
||||
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ /admin Panel │
|
||||
│ - Key Mgmt │
|
||||
│ - Dashboard │
|
||||
│ - Logs │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## Schnellstart
|
||||
|
||||
### Voraussetzungen
|
||||
|
||||
- Docker Desktop (Windows) oder Docker + docker-compose (Linux)
|
||||
- NVIDIA GPU mit CUDA-Unterstützung (RTX 3090)
|
||||
- NVIDIA Container Toolkit installiert
|
||||
|
||||
### Installation
|
||||
|
||||
1. **Repository klonen:**
|
||||
```bash
|
||||
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
|
||||
cd whisper-api
|
||||
```
|
||||
|
||||
2. **Umgebungsvariablen konfigurieren:**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Bearbeite .env nach deinen Wünschen
|
||||
```
|
||||
|
||||
3. **Docker-Container starten:**
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
4. **Erster Start:**
|
||||
- Das `large-v3` Modell (~3GB) wird automatisch heruntergeladen
|
||||
- Dies kann 5-10 Minuten dauern
|
||||
- Status überprüfen: `docker-compose logs -f`
|
||||
|
||||
### Verifizierung
|
||||
|
||||
```bash
|
||||
# Health-Check
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# API-Info
|
||||
curl http://localhost:8000/v1/models
|
||||
```
|
||||
|
||||
## API-Dokumentation
|
||||
|
||||
### Authentifizierung
|
||||
|
||||
Alle API-Endpunkte (außer `/health` und `/admin`) benötigen einen API-Key:
|
||||
|
||||
```bash
|
||||
Authorization: Bearer sk-dein-api-key-hier
|
||||
```
|
||||
|
||||
### Endpunkte
|
||||
|
||||
#### POST /v1/audio/transcriptions
|
||||
|
||||
Transkribiert eine Audio-Datei.
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/audio/transcriptions \
|
||||
-H "Authorization: Bearer sk-dein-api-key" \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "file=@/pfad/zur/audio.mp3" \
|
||||
-F "model=large-v3" \
|
||||
-F "language=de" \
|
||||
-F "response_format=json"
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"text": "Hallo Welt, das ist ein Test."
|
||||
}
|
||||
```
|
||||
|
||||
#### POST /v1/audio/transcriptions (mit Timestamps)
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/audio/transcriptions \
|
||||
-H "Authorization: Bearer sk-dein-api-key" \
|
||||
-F "file=@audio.mp3" \
|
||||
-F "timestamp_granularities[]=word" \
|
||||
-F "response_format=verbose_json"
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"text": "Hallo Welt",
|
||||
"segments": [
|
||||
{
|
||||
"id": 0,
|
||||
"start": 0.0,
|
||||
"end": 1.5,
|
||||
"text": "Hallo Welt",
|
||||
"words": [
|
||||
{"word": "Hallo", "start": 0.0, "end": 0.5},
|
||||
{"word": "Welt", "start": 0.6, "end": 1.2}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /v1/models
|
||||
|
||||
Liste verfügbarer Modelle.
|
||||
|
||||
#### GET /health
|
||||
|
||||
Health-Check mit GPU-Status.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"gpu": {
|
||||
"available": true,
|
||||
"name": "NVIDIA GeForce RTX 3090",
|
||||
"vram_used": "2.1 GB",
|
||||
"vram_total": "24.0 GB"
|
||||
},
|
||||
"model": "large-v3",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
```
|
||||
|
||||
## Admin-Interface
|
||||
|
||||
Das Web-Interface ist erreichbar unter: `http://localhost:8000/admin`
|
||||
|
||||
### Login
|
||||
|
||||
- **Benutzername:** `admin` (konfigurierbar in `.env`)
|
||||
- **Passwort:** `-whisper12510-` (konfigurierbar in `.env`)
|
||||
|
||||
### Features
|
||||
|
||||
- **Dashboard:** Übersicht über Nutzung, Performance-Statistiken
|
||||
- **API-Keys:** Verwalten (erstellen, deaktivieren, löschen)
|
||||
- **Logs:** Detaillierte Transkriptions-Logs mit Filter
|
||||
|
||||
## Konfiguration
|
||||
|
||||
### .env.example
|
||||
|
||||
```bash
|
||||
# Server
|
||||
PORT=8000
|
||||
HOST=0.0.0.0
|
||||
|
||||
# Whisper
|
||||
WHISPER_MODEL=large-v3
|
||||
WHISPER_DEVICE=cuda
|
||||
WHISPER_COMPUTE_TYPE=float16
|
||||
|
||||
# Authentifizierung
|
||||
# Mehrere API-Keys mit Komma trennen
|
||||
API_KEYS=sk-dein-erster-key,sk-dein-zweiter-key
|
||||
ADMIN_USER=admin
|
||||
ADMIN_PASSWORD=-whisper12510-
|
||||
|
||||
# Daten-Retention (Tage)
|
||||
LOG_RETENTION_DAYS=30
|
||||
|
||||
# Optional: Sentry für Error-Tracking
|
||||
# SENTRY_DSN=https://...
|
||||
```
|
||||
|
||||
### Docker-Compose Anpassungen
|
||||
|
||||
```yaml
|
||||
services:
|
||||
whisper-api:
|
||||
# ...
|
||||
environment:
|
||||
- PORT=8000 # Änderbar
|
||||
- WHISPER_MODEL=large-v3
|
||||
volumes:
|
||||
- ./models:/app/models # Persistiert Modelle
|
||||
- ./data:/app/data # SQLite Datenbank
|
||||
- ./uploads:/app/uploads # Temporäre Uploads
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: 1
|
||||
capabilities: [gpu]
|
||||
```
|
||||
|
||||
## Migration zu Linux
|
||||
|
||||
Die Docker-Konfiguration ist plattformunabhängig. Für Linux:
|
||||
|
||||
1. **NVIDIA Docker installieren:**
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
||||
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nvidia-docker2
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
2. **Projekt klonen und starten:**
|
||||
```bash
|
||||
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
|
||||
cd whisper-api
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
3. **GPU-Passthrough verifizieren:**
|
||||
```bash
|
||||
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
|
||||
```
|
||||
|
||||
## Integration mit Clawdbot
|
||||
|
||||
Für die Integration in einen Clawdbot Skill:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
API_URL = "http://localhost:8000/v1/audio/transcriptions"
|
||||
API_KEY = "sk-dein-api-key"
|
||||
|
||||
def transcribe_audio(audio_path):
|
||||
with open(audio_path, "rb") as f:
|
||||
response = requests.post(
|
||||
API_URL,
|
||||
headers={"Authorization": f"Bearer {API_KEY}"},
|
||||
files={"file": f},
|
||||
data={"language": "de"}
|
||||
)
|
||||
return response.json()["text"]
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
Mit RTX 3090 und large-v3:
|
||||
- **1 Minute Audio:** ~3-5 Sekunden Verarbeitungszeit
|
||||
- **VRAM-Nutzung:** ~10 GB
|
||||
- **Batch-Verarbeitung:** Möglich für parallele Requests
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GPU nicht erkannt
|
||||
|
||||
```bash
|
||||
# NVIDIA Container Toolkit prüfen
|
||||
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
|
||||
|
||||
# Logs prüfen
|
||||
docker-compose logs whisper-api
|
||||
```
|
||||
|
||||
### Modell-Download langsam
|
||||
|
||||
```bash
|
||||
# Manuelles Downloaden möglich
|
||||
mkdir -p models
|
||||
# Modelle werden von HuggingFace heruntergeladen
|
||||
```
|
||||
|
||||
### Port belegt
|
||||
|
||||
```bash
|
||||
# Port in .env ändern
|
||||
PORT=8001
|
||||
```
|
||||
|
||||
## Backup
|
||||
|
||||
Wichtige Daten:
|
||||
- `./data/` - SQLite Datenbank (API-Keys, Logs)
|
||||
- `./models/` - Heruntergeladene Whisper-Modelle
|
||||
- `./.env` - Konfiguration
|
||||
|
||||
```bash
|
||||
# Backup erstellen
|
||||
tar -czvf whisper-api-backup.tar.gz data/ models/ .env
|
||||
```
|
||||
|
||||
## Lizenz
|
||||
|
||||
MIT License - Siehe LICENSE Datei
|
||||
|
||||
## Support
|
||||
|
||||
Bei Problemen:
|
||||
1. Logs prüfen: `docker-compose logs -f`
|
||||
2. Health-Check: `curl http://localhost:8000/health`
|
||||
3. Issue auf Gitea erstellen
|
||||
|
||||
---
|
||||
|
||||
**Erstellt für:** b0rborad @ ragtag.rocks
|
||||
**Hardware:** Dual RTX 3090 Setup
|
||||
**Zweck:** Clawdbot Skill Integration
|
||||
Reference in New Issue
Block a user