research-bridge/docs/TDD.md

# TDD: Research Bridge - SearXNG + Kimi for Coding Integration

## AI Council Review Document
**Project:** research-bridge
**Purpose:** Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding
**Cost Target:** **$0** per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo)
**Architecture:** Modular, testable, async-first

---

## 1. Executive Summary

### Problem
Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly.

### Solution
Replace Perplexity with a two-tier architecture:
1. **SearXNG** (self-hosted, **FREE**): Aggregates search results from 70+ sources
2. **Kimi for Coding** (via **bestehendes Abo**, **$0**): Summarizes and reasons over results

### Expected Outcome
- **Cost:** **$0 per query** (vs $0.02-0.05 with Perplexity)
- **Latency:** 2-5s per query
- **Quality:** Comparable to Perplexity Sonar

---

## 2. Architecture Overview

```
┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   User Query    │────▶│  Query Router    │────▶│   SearXNG       │
│                 │     │  (FastAPI)       │     │   (Self-Hosted) │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                                                        │
                                                        ▼
                                               ┌─────────────────┐
                                               │  Search Results │
                                               │  (JSON/Raw)     │
                                               └─────────────────┘
                                                        │
┌─────────────────┐     ┌──────────────────┐           │
│   Response      │◀────│  Kimi for Coding │◀──────────┘
│   (Markdown)    │     │  (Synthesizer)   │
└─────────────────┘     └──────────────────┘
```

### Core Components

| Component | Responsibility | Tech Stack |
|-----------|---------------|------------|
| `query-router` | HTTP API, validation, routing | FastAPI, Pydantic |
| `searxng-client` | Interface to SearXNG instance | aiohttp, caching |
| `synthesizer` | LLM prompts, response formatting | Kimi for Coding API |
| `cache-layer` | Result deduplication | Redis (optional) |
| `rate-limiter` | Prevent abuse | slowapi |

---

## 3. Component Specifications

### 3.1 Query Router (`src/api/router.py`)

**Purpose:** FastAPI application handling HTTP requests

**Endpoints:**
```python
POST /research
Request:  {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]}
Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int}

GET /health
Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool}

GET /search (passthrough)
Request:  {"q": "string", "engines": ["google", "bing"], "page": 1}
Response: Raw SearXNG JSON
```

**Validation Rules:**
- Query: min 3, max 500 characters
- Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis)
- Rate limit: 30 req/min per IP

---

### 3.2 SearXNG Client (`src/search/searxng.py`)

**Purpose:** Async client for SearXNG instance

**Configuration:**
```yaml
searxng:
  base_url: "http://localhost:8080"  # or external instance
  timeout: 10
  max_results: 10
  engines:
    default: ["google", "bing", "duckduckgo"]
    news: ["google_news", "bing_news"]
    academic: ["google_scholar", "arxiv"]
```

**Interface:**
```python
class SearXNGClient:
    async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult
    async def search_multi(self, queries: list[str]) -> list[SearchResult]  # for deep mode
```

**Caching:**
- Cache key: SHA256(query + engines.join(","))
- TTL: 1 hour for identical queries
- Storage: In-memory LRU (1000 entries) or Redis

---

### 3.3 Synthesizer (`src/llm/synthesizer.py`)

**Purpose:** Transform search results into coherent answers using Kimi for Coding

**⚠️ CRITICAL:** Kimi for Coding API requires special `User-Agent: KimiCLI/0.77` header!

**API Configuration:**
```python
{
    "base_url": "https://api.kimi.com/coding/v1",
    "api_key": "sk-kimi-...",  # Kimi for Coding API Key
    "headers": {
        "User-Agent": "KimiCLI/0.77"  # REQUIRED - 403 without this!
    }
}
```

**Prompt Strategy:**
```
You are a research assistant. Synthesize the following search results into a
clear, accurate answer. Include citations [1], [2], etc.

User Query: {query}

Search Results:
{formatted_results}

Instructions:
1. Answer directly and concisely
2. Cite sources using [1], [2] format
3. If results conflict, note the discrepancy
4. If insufficient data, say so clearly

Answer in {language}.
```

**Implementation:**
```python
from openai import AsyncOpenAI

class Synthesizer:
    def __init__(self, api_key: str, model: str = "kimi-for-coding"):
        self.client = AsyncOpenAI(
            base_url="https://api.kimi.com/coding/v1",
            api_key=api_key,
            default_headers={"User-Agent": "KimiCLI/0.77"}  # CRITICAL!
        )

    async def synthesize(
        self,
        query: str,
        results: list[SearchResult],
        max_tokens: int = 2048
    ) -> SynthesisResult:
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": self._format_prompt(query, results)}
            ],
            max_tokens=max_tokens
        )
        return SynthesisResult(
            content=response.choices[0].message.content,
            sources=self._extract_citations(results)
        )
```

**Performance Notes:**
- Kimi for Coding optimized for code + reasoning tasks
- Truncate search results to ~4000 tokens to stay within context
- Cache syntheses for identical result sets

---

### 3.4 Rate Limiter (`src/middleware/ratelimit.py`)

**Purpose:** Protect against abuse and control costs

**Strategy:**
- IP-based: 30 requests/minute
- Global: 1000 requests/hour (configurable)
- Burst: Allow 5 requests immediately, then token bucket

---

## 4. Data Models (`src/models/`)

### SearchResult
```python
class SearchResult(BaseModel):
    title: str
    url: str
    content: str | None  # Snippet or full text
    source: str  # Engine name
    score: float | None
    published: datetime | None
```

### ResearchResponse
```python
class ResearchResponse(BaseModel):
    query: str
    depth: str
    synthesis: str
    sources: list[dict]  # {title, url, index}
    raw_results: list[SearchResult] | None  # null if omit_raw=true
    metadata: dict  # {latency_ms, cache_hit, tokens_used}
```

### Config
```python
class Config(BaseModel):
    searxng_url: str
    kimi_api_key: str  # Kimi for Coding API Key
    cache_backend: Literal["memory", "redis"] = "memory"
    rate_limit: dict  # requests, window
```

---

## 5. Testing Strategy

### Test Categories

| Category | Location | Responsibility |
|----------|----------|----------------|
| Unit | `tests/unit/` | Individual functions, pure logic |
| Integration | `tests/integration/` | Component interactions |
| E2E | `tests/e2e/` | Full request flow |
| Performance | `tests/perf/` | Load testing |

### Test Isolation Principle
**CRITICAL:** Each test category runs independently. No test should require another test to run first.

### 5.1 Unit Tests (`tests/unit/`)

**test_synthesizer.py:**
- Mock Kimi for Coding API responses
- Test prompt formatting
- Test User-Agent header injection
- Test token counting/truncation
- Test error handling (API down, auth errors)

**test_searxng_client.py:**
- Mock HTTP responses
- Test result parsing
- Test caching logic
- Test timeout handling

**test_models.py:**
- Pydantic validation
- Serialization/deserialization

### 5.2 Integration Tests (`tests/integration/`)

**Requires:** Running SearXNG instance (Docker)

**test_search_flow.py:**
- Real SearXNG queries
- Cache interaction
- Error propagation

**test_api.py:**
- FastAPI test client
- Request/response validation
- Rate limiting behavior

### 5.3 E2E Tests (`tests/e2e/`)

**test_research_endpoint.py:**
- Full flow: query → search → synthesize → response
- Verify citation format
- Verify source attribution

---

## 6. Implementation Phases

### Phase 1: Foundation (No LLM yet) ✅ COMPLETE
**Goal:** Working search API
**Deliverables:**
- [x] Project structure with pyproject.toml
- [x] SearXNG client with async HTTP
- [x] FastAPI router with `/search` endpoint
- [x] Basic tests (mocked) - 28 tests, 92% coverage
- [x] Docker Compose for SearXNG

**Acceptance Criteria:**
```bash
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"q": "python asyncio", "engines": ["google"]}'
# Returns valid SearXNG results
```

**Status:** ✅ All tests passing, 92% coverage

### Phase 2: Synthesis Layer ✅ COMPLETE
**Goal:** Add Kimi for Coding integration
**Deliverables:**
- [x] Synthesizer class with Kimi for Coding API
- [x] `/research` endpoint combining search + synthesis
- [x] Prompt templates
- [x] Response formatting with citations
- [x] User-Agent header handling

**Acceptance Criteria:**
```bash
curl -X POST http://localhost:8000/research \
  -d '{"query": "What is Python asyncio?"}'
# Returns synthesized answer with citations
```

**Status:** ✅ Implemented, tested (40 tests, 90% coverage)

### Phase 3: Polish
**Goal:** Production readiness
**Deliverables:**
- [ ] Rate limiting
- [ ] Caching (Redis optional)
- [ ] Structured logging
- [ ] Health checks
- [ ] Metrics (Prometheus)
- [ ] Documentation

---

## 7. Configuration

### Environment Variables
```bash
RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080
RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-...  # Kimi for Coding Key
RESEARCH_BRIDGE_LOG_LEVEL=INFO
RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379  # optional
```

### Important: Kimi for Coding API Requirements
```python
# The API requires a special User-Agent header!
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "User-Agent": "KimiCLI/0.77"  # ← REQUIRED! 403 without this
}
```

### Docker Compose (SearXNG)
```yaml
# config/searxng-docker-compose.yml
version: '3'
services:
  searxng:
    image: searxng/searxng:latest
    ports:
      - "8080:8080"
    volumes:
      - ./searxng-settings.yml:/etc/searxng/settings.yml
```

---

## 8. API Contract

### POST /research

**Request:**
```json
{
  "query": "latest developments in fusion energy",
  "depth": "deep",
  "sources": ["web", "news"],
  "language": "en",
  "omit_raw": false
}
```

**Response:**
```json
{
  "query": "latest developments in fusion energy",
  "depth": "deep",
  "synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]",
  "sources": [
    {"index": 1, "title": "Fusion breakthrough", "url": "https://..."},
    {"index": 2, "title": "CFS milestone", "url": "https://..."}
  ],
  "raw_results": [...],
  "metadata": {
    "latency_ms": 3200,
    "cache_hit": false,
    "tokens_used": 1247,
    "cost_usd": 0.0
  }
}
```

---

## 9. Cost Analysis

### Per-Query Costs

| Component | Cost | Notes |
|-----------|------|-------|
| **SearXNG** | **$0.00** | Self-hosted, Open Source, keine API-Kosten |
| **Kimi for Coding** | **$0.00** | Via bestehendes Abo (keine zusätzlichen Kosten) |
| **Gesamt pro Query** | **$0.00** | |

**Vergleich:**
| Lösung | Kosten pro Query | Faktor |
|--------|------------------|--------|
| Perplexity Sonar Pro | ~$0.015-0.03 | ∞ (teurer) |
| Perplexity API direkt | ~$0.005 | ∞ (teurer) |
| **Research Bridge** | **$0.00** | **Baseline** |

**Einsparung: 100%** der laufenden Kosten!

### Warum ist das komplett kostenlos?
- **SearXNG:** Gratis (Open Source, self-hosted)
- **Kimi for Coding:** Bereits über bestehendes Abo abgedeckt
- Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren

### Break-Even Analysis
- Einrichtungsaufwand: ~10 Stunden
- Bei beliebiger Nutzung: **$0 laufende Kosten** vs. $X mit Perplexity

---

## 10. Success Criteria

### Functional
- [ ] `/research` returns synthesized answers in <5s
- [ ] Citations link to original sources
- [ ] Rate limiting prevents abuse
- [ ] Health endpoint confirms all dependencies

### Quality
- [ ] Answer quality matches Perplexity in blind test (n=20)
- [ ] Citation accuracy >95%
- [ ] Handles ambiguous queries gracefully

### Operational
- [ ] 99% uptime (excluding planned maintenance)
- [ ] <1% error rate
- [ ] Logs structured for observability

---

## 11. Risks & Mitigations

| Risk | Likelihood | Impact | Mitigation |
|------|------------|--------|------------|
| SearXNG instance down | Medium | High | Deploy redundant instance, fallback engines |
| Kimi for Coding API changes | Low | Medium | Abstract API client, monitor for breaking changes |
| User-Agent requirement breaks | Low | High | Hardcoded header, monitor API docs for updates |
| Answer quality poor | Medium | High | A/B test prompts, fallback to deeper search |

---

## 12. Future Enhancements

- **Follow-up questions:** Context-aware multi-turn research
- **Source extraction:** Fetch full article text via crawling
- **PDF support:** Search and synthesize academic papers
- **Custom prompts:** User-defined synthesis instructions
- **Webhook notifications:** Async research with callback

---

## 13. Appendix: Implementation Notes

### Kimi for Coding API Specifics

**Required Headers:**
```python
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "User-Agent": "KimiCLI/0.77"  # ← CRITICAL! 403 without this
}
```

**OpenAI-Compatible Client Setup:**
```python
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.kimi.com/coding/v1",
    api_key=api_key,
    default_headers={"User-Agent": "KimiCLI/0.77"}
)
```

**Model Name:** `kimi-for-coding`

**Prompting Best Practices:**
- Works best with clear, structured prompts
- Handles long contexts well
- Use explicit formatting instructions
- Add "Think step by step" for complex synthesis

### SearXNG Tuning
- Enable `json` format for structured results
- Use `safesearch=0` for unfiltered results
- Request `time_range: month` for recent content
- Add "Think step by step" for complex synthesis

### SearXNG Tuning
- Enable `json` format for structured results
- Use `safesearch=0` for unfiltered results
- Request `time_range: month` for recent content

---

**Document Version:** 1.0
**Last Updated:** 2026-03-14
**Next Review:** Post-Phase-1 implementation