Files
research-bridge/docs/TDD.md

536 lines
15 KiB
Markdown

# TDD: Research Bridge - SearXNG + Kimi for Coding Integration
## AI Council Review Document
**Project:** research-bridge
**Purpose:** Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding
**Cost Target:** **$0** per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo)
**Architecture:** Modular, testable, async-first
---
## 1. Executive Summary
### Problem
Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly.
### Solution
Replace Perplexity with a two-tier architecture:
1. **SearXNG** (self-hosted, **FREE**): Aggregates search results from 70+ sources
2. **Kimi for Coding** (via **bestehendes Abo**, **$0**): Summarizes and reasons over results
### Expected Outcome
- **Cost:** **$0 per query** (vs $0.02-0.05 with Perplexity)
- **Latency:** 2-5s per query
- **Quality:** Comparable to Perplexity Sonar
---
## 2. Architecture Overview
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Query │────▶│ Query Router │────▶│ SearXNG │
│ │ │ (FastAPI) │ │ (Self-Hosted) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
┌─────────────────┐
│ Search Results │
│ (JSON/Raw) │
└─────────────────┘
┌─────────────────┐ ┌──────────────────┐ │
│ Response │◀────│ Kimi for Coding │◀──────────┘
│ (Markdown) │ │ (Synthesizer) │
└─────────────────┘ └──────────────────┘
```
### Core Components
| Component | Responsibility | Tech Stack |
|-----------|---------------|------------|
| `query-router` | HTTP API, validation, routing | FastAPI, Pydantic |
| `searxng-client` | Interface to SearXNG instance | aiohttp, caching |
| `synthesizer` | LLM prompts, response formatting | Kimi for Coding API |
| `cache-layer` | Result deduplication | Redis (optional) |
| `rate-limiter` | Prevent abuse | slowapi |
---
## 3. Component Specifications
### 3.1 Query Router (`src/api/router.py`)
**Purpose:** FastAPI application handling HTTP requests
**Endpoints:**
```python
POST /research
Request: {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]}
Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int}
GET /health
Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool}
GET /search (passthrough)
Request: {"q": "string", "engines": ["google", "bing"], "page": 1}
Response: Raw SearXNG JSON
```
**Validation Rules:**
- Query: min 3, max 500 characters
- Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis)
- Rate limit: 30 req/min per IP
---
### 3.2 SearXNG Client (`src/search/searxng.py`)
**Purpose:** Async client for SearXNG instance
**Configuration:**
```yaml
searxng:
base_url: "http://localhost:8080" # or external instance
timeout: 10
max_results: 10
engines:
default: ["google", "bing", "duckduckgo"]
news: ["google_news", "bing_news"]
academic: ["google_scholar", "arxiv"]
```
**Interface:**
```python
class SearXNGClient:
async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult
async def search_multi(self, queries: list[str]) -> list[SearchResult] # for deep mode
```
**Caching:**
- Cache key: SHA256(query + engines.join(","))
- TTL: 1 hour for identical queries
- Storage: In-memory LRU (1000 entries) or Redis
---
### 3.3 Synthesizer (`src/llm/synthesizer.py`)
**Purpose:** Transform search results into coherent answers using Kimi for Coding
**⚠️ CRITICAL:** Kimi for Coding API requires special `User-Agent: KimiCLI/0.77` header!
**API Configuration:**
```python
{
"base_url": "https://api.kimi.com/coding/v1",
"api_key": "sk-kimi-...", # Kimi for Coding API Key
"headers": {
"User-Agent": "KimiCLI/0.77" # REQUIRED - 403 without this!
}
}
```
**Prompt Strategy:**
```
You are a research assistant. Synthesize the following search results into a
clear, accurate answer. Include citations [1], [2], etc.
User Query: {query}
Search Results:
{formatted_results}
Instructions:
1. Answer directly and concisely
2. Cite sources using [1], [2] format
3. If results conflict, note the discrepancy
4. If insufficient data, say so clearly
Answer in {language}.
```
**Implementation:**
```python
from openai import AsyncOpenAI
class Synthesizer:
def __init__(self, api_key: str, model: str = "kimi-for-coding"):
self.client = AsyncOpenAI(
base_url="https://api.kimi.com/coding/v1",
api_key=api_key,
default_headers={"User-Agent": "KimiCLI/0.77"} # CRITICAL!
)
async def synthesize(
self,
query: str,
results: list[SearchResult],
max_tokens: int = 2048
) -> SynthesisResult:
response = await self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": self._format_prompt(query, results)}
],
max_tokens=max_tokens
)
return SynthesisResult(
content=response.choices[0].message.content,
sources=self._extract_citations(results)
)
```
**Performance Notes:**
- Kimi for Coding optimized for code + reasoning tasks
- Truncate search results to ~4000 tokens to stay within context
- Cache syntheses for identical result sets
---
### 3.4 Rate Limiter (`src/middleware/ratelimit.py`)
**Purpose:** Protect against abuse and control costs
**Strategy:**
- IP-based: 30 requests/minute
- Global: 1000 requests/hour (configurable)
- Burst: Allow 5 requests immediately, then token bucket
---
## 4. Data Models (`src/models/`)
### SearchResult
```python
class SearchResult(BaseModel):
title: str
url: str
content: str | None # Snippet or full text
source: str # Engine name
score: float | None
published: datetime | None
```
### ResearchResponse
```python
class ResearchResponse(BaseModel):
query: str
depth: str
synthesis: str
sources: list[dict] # {title, url, index}
raw_results: list[SearchResult] | None # null if omit_raw=true
metadata: dict # {latency_ms, cache_hit, tokens_used}
```
### Config
```python
class Config(BaseModel):
searxng_url: str
kimi_api_key: str # Kimi for Coding API Key
cache_backend: Literal["memory", "redis"] = "memory"
rate_limit: dict # requests, window
```
---
## 5. Testing Strategy
### Test Categories
| Category | Location | Responsibility |
|----------|----------|----------------|
| Unit | `tests/unit/` | Individual functions, pure logic |
| Integration | `tests/integration/` | Component interactions |
| E2E | `tests/e2e/` | Full request flow |
| Performance | `tests/perf/` | Load testing |
### Test Isolation Principle
**CRITICAL:** Each test category runs independently. No test should require another test to run first.
### 5.1 Unit Tests (`tests/unit/`)
**test_synthesizer.py:**
- Mock Kimi for Coding API responses
- Test prompt formatting
- Test User-Agent header injection
- Test token counting/truncation
- Test error handling (API down, auth errors)
**test_searxng_client.py:**
- Mock HTTP responses
- Test result parsing
- Test caching logic
- Test timeout handling
**test_models.py:**
- Pydantic validation
- Serialization/deserialization
### 5.2 Integration Tests (`tests/integration/`)
**Requires:** Running SearXNG instance (Docker)
**test_search_flow.py:**
- Real SearXNG queries
- Cache interaction
- Error propagation
**test_api.py:**
- FastAPI test client
- Request/response validation
- Rate limiting behavior
### 5.3 E2E Tests (`tests/e2e/`)
**test_research_endpoint.py:**
- Full flow: query → search → synthesize → response
- Verify citation format
- Verify source attribution
---
## 6. Implementation Phases
### Phase 1: Foundation (No LLM yet) ✅ COMPLETE
**Goal:** Working search API
**Deliverables:**
- [x] Project structure with pyproject.toml
- [x] SearXNG client with async HTTP
- [x] FastAPI router with `/search` endpoint
- [x] Basic tests (mocked) - 28 tests, 92% coverage
- [x] Docker Compose for SearXNG
**Acceptance Criteria:**
```bash
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"q": "python asyncio", "engines": ["google"]}'
# Returns valid SearXNG results
```
**Status:** ✅ All tests passing, 92% coverage
### Phase 2: Synthesis Layer ✅ COMPLETE
**Goal:** Add Kimi for Coding integration
**Deliverables:**
- [x] Synthesizer class with Kimi for Coding API
- [x] `/research` endpoint combining search + synthesis
- [x] Prompt templates
- [x] Response formatting with citations
- [x] User-Agent header handling
**Acceptance Criteria:**
```bash
curl -X POST http://localhost:8000/research \
-d '{"query": "What is Python asyncio?"}'
# Returns synthesized answer with citations
```
**Status:** ✅ Implemented, tested (40 tests, 90% coverage)
### Phase 3: Polish
**Goal:** Production readiness
**Deliverables:**
- [ ] Rate limiting
- [ ] Caching (Redis optional)
- [ ] Structured logging
- [ ] Health checks
- [ ] Metrics (Prometheus)
- [ ] Documentation
---
## 7. Configuration
### Environment Variables
```bash
RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080
RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-... # Kimi for Coding Key
RESEARCH_BRIDGE_LOG_LEVEL=INFO
RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379 # optional
```
### Important: Kimi for Coding API Requirements
```python
# The API requires a special User-Agent header!
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": "KimiCLI/0.77" # ← REQUIRED! 403 without this
}
```
### Docker Compose (SearXNG)
```yaml
# config/searxng-docker-compose.yml
version: '3'
services:
searxng:
image: searxng/searxng:latest
ports:
- "8080:8080"
volumes:
- ./searxng-settings.yml:/etc/searxng/settings.yml
```
---
## 8. API Contract
### POST /research
**Request:**
```json
{
"query": "latest developments in fusion energy",
"depth": "deep",
"sources": ["web", "news"],
"language": "en",
"omit_raw": false
}
```
**Response:**
```json
{
"query": "latest developments in fusion energy",
"depth": "deep",
"synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]",
"sources": [
{"index": 1, "title": "Fusion breakthrough", "url": "https://..."},
{"index": 2, "title": "CFS milestone", "url": "https://..."}
],
"raw_results": [...],
"metadata": {
"latency_ms": 3200,
"cache_hit": false,
"tokens_used": 1247,
"cost_usd": 0.0
}
}
```
---
## 9. Cost Analysis
### Per-Query Costs
| Component | Cost | Notes |
|-----------|------|-------|
| **SearXNG** | **$0.00** | Self-hosted, Open Source, keine API-Kosten |
| **Kimi for Coding** | **$0.00** | Via bestehendes Abo (keine zusätzlichen Kosten) |
| **Gesamt pro Query** | **$0.00** | |
**Vergleich:**
| Lösung | Kosten pro Query | Faktor |
|--------|------------------|--------|
| Perplexity Sonar Pro | ~$0.015-0.03 | ∞ (teurer) |
| Perplexity API direkt | ~$0.005 | ∞ (teurer) |
| **Research Bridge** | **$0.00** | **Baseline** |
**Einsparung: 100%** der laufenden Kosten!
### Warum ist das komplett kostenlos?
- **SearXNG:** Gratis (Open Source, self-hosted)
- **Kimi for Coding:** Bereits über bestehendes Abo abgedeckt
- Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren
### Break-Even Analysis
- Einrichtungsaufwand: ~10 Stunden
- Bei beliebiger Nutzung: **$0 laufende Kosten** vs. $X mit Perplexity
---
## 10. Success Criteria
### Functional
- [ ] `/research` returns synthesized answers in <5s
- [ ] Citations link to original sources
- [ ] Rate limiting prevents abuse
- [ ] Health endpoint confirms all dependencies
### Quality
- [ ] Answer quality matches Perplexity in blind test (n=20)
- [ ] Citation accuracy >95%
- [ ] Handles ambiguous queries gracefully
### Operational
- [ ] 99% uptime (excluding planned maintenance)
- [ ] <1% error rate
- [ ] Logs structured for observability
---
## 11. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| SearXNG instance down | Medium | High | Deploy redundant instance, fallback engines |
| Kimi for Coding API changes | Low | Medium | Abstract API client, monitor for breaking changes |
| User-Agent requirement breaks | Low | High | Hardcoded header, monitor API docs for updates |
| Answer quality poor | Medium | High | A/B test prompts, fallback to deeper search |
---
## 12. Future Enhancements
- **Follow-up questions:** Context-aware multi-turn research
- **Source extraction:** Fetch full article text via crawling
- **PDF support:** Search and synthesize academic papers
- **Custom prompts:** User-defined synthesis instructions
- **Webhook notifications:** Async research with callback
---
## 13. Appendix: Implementation Notes
### Kimi for Coding API Specifics
**Required Headers:**
```python
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": "KimiCLI/0.77" # ← CRITICAL! 403 without this
}
```
**OpenAI-Compatible Client Setup:**
```python
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://api.kimi.com/coding/v1",
api_key=api_key,
default_headers={"User-Agent": "KimiCLI/0.77"}
)
```
**Model Name:** `kimi-for-coding`
**Prompting Best Practices:**
- Works best with clear, structured prompts
- Handles long contexts well
- Use explicit formatting instructions
- Add "Think step by step" for complex synthesis
### SearXNG Tuning
- Enable `json` format for structured results
- Use `safesearch=0` for unfiltered results
- Request `time_range: month` for recent content
- Add "Think step by step" for complex synthesis
### SearXNG Tuning
- Enable `json` format for structured results
- Use `safesearch=0` for unfiltered results
- Request `time_range: month` for recent content
---
**Document Version:** 1.0
**Last Updated:** 2026-03-14
**Next Review:** Post-Phase-1 implementation