Add English README and cross-link both documentation versions
This commit is contained in:
459
README_EN.md
Normal file
459
README_EN.md
Normal file
@@ -0,0 +1,459 @@
|
||||
# Whisper API
|
||||
|
||||
[](https://opensource.org/licenses/MIT)
|
||||
|
||||
A local Whisper API with GPU acceleration and web admin interface for audio transcription. OpenAI-compatible API with multi-model support.
|
||||
|
||||
[🇩🇪 Deutsche Version](README.md) | **🇺🇸 English Version**
|
||||
|
||||
## Features
|
||||
|
||||
- **OpenAI-compatible API** - Drop-in replacement for OpenAI Whisper API
|
||||
- **GPU Accelerated** - Uses NVIDIA GPUs (CUDA) for fast transcription
|
||||
- **CPU Fallback** - Automatic switch to CPU when no GPU is available
|
||||
- **Multi-Model Support** - Supports all Whisper models (tiny to large-v3)
|
||||
- **Model Management** - Download, switch and delete models via Admin Panel
|
||||
- **Default: large-v3** - Best quality with your RTX 3090
|
||||
- **Web Admin Interface** - API key management, model management and statistics at `/admin`
|
||||
- **API Key Authentication** - Secure access control (Environment + Database)
|
||||
- **Cross-Platform** - Docker-based, runs on Windows and Linux
|
||||
- **Automatic Cleanup** - Logs automatically deleted after 30 days
|
||||
- **Persistent Storage** - Models and data in Docker volumes
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||
│ Client/App │────▶│ FastAPI App │────▶│ Whisper GPU │
|
||||
│ (Clawdbot etc) │ │ (Port 8000) │ │ (large-v3) │
|
||||
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ /admin Panel │
|
||||
│ - Key Mgmt │
|
||||
│ - Models │
|
||||
│ - Dashboard │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Docker Desktop (Windows) or Docker + docker-compose (Linux)
|
||||
- NVIDIA GPU with CUDA support (RTX 3090) - optional, CPU fallback available
|
||||
- NVIDIA Container Toolkit installed (for GPU support)
|
||||
|
||||
### Installation
|
||||
|
||||
1. **Clone repository:**
|
||||
```bash
|
||||
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
|
||||
cd whisper-api
|
||||
```
|
||||
|
||||
2. **Configure environment variables:**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
# Edit .env to your needs
|
||||
```
|
||||
|
||||
3. **Start Docker container:**
|
||||
```bash
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
4. **First start:**
|
||||
- The `large-v3` model (~3GB) will be downloaded automatically
|
||||
- This may take 5-10 minutes
|
||||
- Check status: `docker-compose logs -f`
|
||||
|
||||
### Verification
|
||||
|
||||
```bash
|
||||
# Health check
|
||||
curl http://localhost:8000/health
|
||||
|
||||
# API info
|
||||
curl http://localhost:8000/v1/models
|
||||
```
|
||||
|
||||
## API Documentation
|
||||
|
||||
### Authentication
|
||||
|
||||
All API endpoints (except `/health` and `/admin`) require an API key:
|
||||
|
||||
```bash
|
||||
Authorization: Bearer sk-your-api-key-here
|
||||
```
|
||||
|
||||
### Endpoints
|
||||
|
||||
#### POST /v1/audio/transcriptions
|
||||
|
||||
Transcribes an audio file.
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/audio/transcriptions \
|
||||
-H "Authorization: Bearer sk-your-api-key" \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "file=@/path/to/audio.mp3" \
|
||||
-F "model=large-v3" \
|
||||
-F "language=de" \
|
||||
-F "response_format=json"
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"text": "Hello World, this is a test."
|
||||
}
|
||||
```
|
||||
|
||||
#### POST /v1/audio/transcriptions (with Timestamps)
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/audio/transcriptions \
|
||||
-H "Authorization: Bearer sk-your-api-key" \
|
||||
-F "file=@audio.mp3" \
|
||||
-F "timestamp_granularities[]=word" \
|
||||
-F "response_format=verbose_json"
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"text": "Hello World",
|
||||
"segments": [
|
||||
{
|
||||
"id": 0,
|
||||
"start": 0.0,
|
||||
"end": 1.5,
|
||||
"text": "Hello World",
|
||||
"words": [
|
||||
{"word": "Hello", "start": 0.0, "end": 0.5},
|
||||
{"word": "World", "start": 0.6, "end": 1.2}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /v1/models
|
||||
|
||||
List available models.
|
||||
|
||||
#### GET /v1/available-models
|
||||
|
||||
List all available Whisper models with download status.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "large-v3",
|
||||
"size": "2.88 GB",
|
||||
"description": "Best accuracy",
|
||||
"is_downloaded": true,
|
||||
"is_active": true
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
#### GET /v1/model-status
|
||||
|
||||
Current download status of the model.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"name": "large-v3",
|
||||
"loaded": true,
|
||||
"is_downloading": false,
|
||||
"download_percentage": 100,
|
||||
"status_message": "Model loaded successfully"
|
||||
}
|
||||
```
|
||||
|
||||
#### POST /v1/switch-model
|
||||
|
||||
Switch to a different model.
|
||||
|
||||
**Request:**
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/switch-model \
|
||||
-H "Authorization: Bearer sk-your-api-key" \
|
||||
-F "model=base"
|
||||
```
|
||||
|
||||
#### POST /v1/reload-model
|
||||
|
||||
Re-download current model.
|
||||
|
||||
#### DELETE /v1/delete-model/{model_name}
|
||||
|
||||
Delete a downloaded model.
|
||||
|
||||
#### GET /health
|
||||
|
||||
Health check with GPU and model status.
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"status": "healthy",
|
||||
"model": "large-v3",
|
||||
"gpu": {
|
||||
"available": true,
|
||||
"name": "NVIDIA GeForce RTX 3090",
|
||||
"vram_used_gb": 2.1,
|
||||
"vram_total_gb": 24.0
|
||||
},
|
||||
"model_status": {
|
||||
"loaded": true,
|
||||
"is_downloading": false,
|
||||
"download_percentage": 100
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Admin Interface
|
||||
|
||||
The web interface is accessible at: `http://localhost:8000/admin`
|
||||
|
||||
### Login
|
||||
|
||||
- **Username:** `admin` (configurable in `.env`)
|
||||
- **Password:** `-whisper12510-` (configurable in `.env`)
|
||||
|
||||
### Features
|
||||
|
||||
- **Dashboard:** Overview of usage, performance statistics, **Model Download Status**
|
||||
- **API Keys:** Manage (create, deactivate, delete)
|
||||
- **Models:**
|
||||
- Manage all Whisper models (tiny, base, small, medium, large-v1, large-v2, large-v3)
|
||||
- Download, activate and delete models
|
||||
- **CPU/GPU Mode Toggle**
|
||||
- Reload model
|
||||
- **Logs:** Detailed transcription logs with filter
|
||||
|
||||
## Configuration
|
||||
|
||||
### .env.example
|
||||
|
||||
```bash
|
||||
# Server
|
||||
PORT=8000
|
||||
HOST=0.0.0.0
|
||||
|
||||
# Whisper
|
||||
WHISPER_MODEL=large-v3
|
||||
WHISPER_DEVICE=cuda # or 'cpu' for CPU mode
|
||||
WHISPER_COMPUTE_TYPE=float16
|
||||
|
||||
# Authentication
|
||||
# Multiple API keys separated by comma
|
||||
API_KEYS=sk-your-first-key,sk-your-second-key
|
||||
ADMIN_USER=admin
|
||||
ADMIN_PASSWORD=-whisper12510-
|
||||
|
||||
# Data retention (days)
|
||||
LOG_RETENTION_DAYS=30
|
||||
|
||||
# Optional: Sentry for error tracking
|
||||
# SENTRY_DSN=https://...
|
||||
```
|
||||
|
||||
### Docker-Compose Customization
|
||||
|
||||
```yaml
|
||||
services:
|
||||
whisper-api:
|
||||
# ...
|
||||
environment:
|
||||
- PORT=8000 # Changeable
|
||||
- WHISPER_MODEL=large-v3
|
||||
- WHISPER_DEVICE=cuda # or 'cpu' for CPU mode
|
||||
volumes:
|
||||
- whisper_models:/app/models # Persists models (Named Volume)
|
||||
- whisper_data:/app/data # SQLite database
|
||||
- whisper_uploads:/app/uploads # Temporary uploads
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: all
|
||||
capabilities: [gpu]
|
||||
|
||||
volumes:
|
||||
whisper_models:
|
||||
whisper_data:
|
||||
whisper_uploads:
|
||||
```
|
||||
|
||||
## Migration to Linux
|
||||
|
||||
The Docker configuration is platform-independent. For Linux:
|
||||
|
||||
1. **Install NVIDIA Docker:**
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
|
||||
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
|
||||
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y nvidia-docker2
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
|
||||
2. **Clone and start project:**
|
||||
```bash
|
||||
git clone https://gitea.ragtag.rocks/b0rborad/whisper-api.git
|
||||
cd whisper-api
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
3. **Verify GPU passthrough:**
|
||||
```bash
|
||||
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
|
||||
```
|
||||
|
||||
## Available Models
|
||||
|
||||
| Model | Size | Description | Speed | Accuracy |
|
||||
|-------|------|-------------|-------|----------|
|
||||
| **tiny** | 39 MB | Fastest, lowest quality | Very fast | Low |
|
||||
| **base** | 74 MB | Good for testing | Fast | Medium |
|
||||
| **small** | 244 MB | Balance speed/quality | Medium | Good |
|
||||
| **medium** | 769 MB | Good accuracy | Slow | Very good |
|
||||
| **large-v2** | 2.87 GB | Higher accuracy | Very slow | Excellent |
|
||||
| **large-v3** | 2.88 GB | Best accuracy (Default) | Very slow | Excellent |
|
||||
|
||||
**Recommendations:**
|
||||
- **Development/Testing:** `base` or `small`
|
||||
- **Production:** `large-v3` (with RTX 3090)
|
||||
- **CPU Mode:** `small` or `medium`
|
||||
|
||||
## Performance
|
||||
|
||||
With RTX 3090 and large-v3:
|
||||
- **1 minute audio:** ~3-5 seconds processing time
|
||||
- **VRAM usage:** ~10 GB
|
||||
- **Batch processing:** Possible for parallel requests
|
||||
|
||||
With CPU and small:
|
||||
- **1 minute audio:** ~30-60 seconds processing time
|
||||
- **RAM usage:** ~1 GB
|
||||
|
||||
## Integration with Clawdbot
|
||||
|
||||
For integration into a Clawdbot skill:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
API_URL = "http://localhost:8000/v1/audio/transcriptions"
|
||||
API_KEY = "sk-your-api-key"
|
||||
|
||||
def transcribe_audio(audio_path):
|
||||
with open(audio_path, "rb") as f:
|
||||
response = requests.post(
|
||||
API_URL,
|
||||
headers={"Authorization": f"Bearer {API_KEY}"},
|
||||
files={"file": f},
|
||||
data={"language": "de"}
|
||||
)
|
||||
return response.json()["text"]
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GPU not recognized / Automatic CPU Fallback
|
||||
|
||||
If no GPU is detected, the API automatically switches to CPU mode:
|
||||
|
||||
```bash
|
||||
# Check NVIDIA Container Toolkit
|
||||
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi
|
||||
|
||||
# Check logs - should show "GPU not available, falling back to CPU mode"
|
||||
docker-compose logs whisper-api
|
||||
```
|
||||
|
||||
**Manual switch:** Via Admin Panel (`/admin/models`) or API:
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/switch-device \
|
||||
-H "Authorization: Bearer sk-your-api-key" \
|
||||
-F "device=cpu"
|
||||
```
|
||||
|
||||
### Model Download Status Display
|
||||
|
||||
- **Dashboard:** Shows download progress in real-time
|
||||
- **API:** `GET /v1/model-status` for current status
|
||||
- **Logs:** `docker-compose logs -f` shows download progress
|
||||
|
||||
### Slow Model Download
|
||||
|
||||
```bash
|
||||
# In Admin Panel under Models select a smaller model (e.g. base, small)
|
||||
# Or via API:
|
||||
curl -X POST http://localhost:8000/v1/switch-model \
|
||||
-H "Authorization: Bearer sk-your-api-key" \
|
||||
-F "model=base"
|
||||
```
|
||||
|
||||
### Port already in use
|
||||
|
||||
```bash
|
||||
# Change port in .env
|
||||
PORT=8001
|
||||
```
|
||||
|
||||
## Backup
|
||||
|
||||
Important data (Docker Named Volumes):
|
||||
- `whisper_data` - SQLite database (API keys, logs)
|
||||
- `whisper_models` - Downloaded Whisper models
|
||||
- `./.env` - Configuration
|
||||
|
||||
```bash
|
||||
# Create backup
|
||||
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "tar czf /backup/whisper-api-backup.tar.gz -C / data models"
|
||||
|
||||
# Or complete backup including .env
|
||||
cp .env .env.backup
|
||||
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine tar czf /backup/whisper-api-full-backup.tar.gz -C / data models
|
||||
```
|
||||
|
||||
### Restore Backup
|
||||
|
||||
```bash
|
||||
# Extract backup
|
||||
docker run --rm -v whisper-api_whisper_data:/data -v whisper-api_whisper_models:/models -v $(pwd):/backup alpine sh -c "cd / && tar xzf /backup/whisper-api-backup.tar.gz"
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT License - See LICENSE file
|
||||
|
||||
## Support
|
||||
|
||||
For issues:
|
||||
1. Check logs: `docker-compose logs -f`
|
||||
2. Health check: `curl http://localhost:8000/health`
|
||||
3. Create issue on Gitea
|
||||
|
||||
---
|
||||
|
||||
**Created for:** b0rborad @ ragtag.rocks
|
||||
**Hardware:** Dual RTX 3090 Setup
|
||||
**Purpose:** Clawdbot Skill Integration
|
||||
Reference in New Issue
Block a user