# TDD: Research Bridge - SearXNG + Kimi for Coding Integration ## AI Council Review Document **Project:** research-bridge **Purpose:** Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding **Cost Target:** **$0** per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo) **Architecture:** Modular, testable, async-first --- ## 1. Executive Summary ### Problem Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly. ### Solution Replace Perplexity with a two-tier architecture: 1. **SearXNG** (self-hosted, **FREE**): Aggregates search results from 70+ sources 2. **Kimi for Coding** (via **bestehendes Abo**, **$0**): Summarizes and reasons over results ### Expected Outcome - **Cost:** **$0 per query** (vs $0.02-0.05 with Perplexity) - **Latency:** 2-5s per query - **Quality:** Comparable to Perplexity Sonar --- ## 2. Architecture Overview ``` ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ │ User Query │────▶│ Query Router │────▶│ SearXNG │ │ │ │ (FastAPI) │ │ (Self-Hosted) │ └─────────────────┘ └──────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Search Results │ │ (JSON/Raw) │ └─────────────────┘ │ ┌─────────────────┐ ┌──────────────────┐ │ │ Response │◀────│ Kimi for Coding │◀──────────┘ │ (Markdown) │ │ (Synthesizer) │ └─────────────────┘ └──────────────────┘ ``` ### Core Components | Component | Responsibility | Tech Stack | |-----------|---------------|------------| | `query-router` | HTTP API, validation, routing | FastAPI, Pydantic | | `searxng-client` | Interface to SearXNG instance | aiohttp, caching | | `synthesizer` | LLM prompts, response formatting | Kimi for Coding API | | `cache-layer` | Result deduplication | Redis (optional) | | `rate-limiter` | Prevent abuse | slowapi | --- ## 3. Component Specifications ### 3.1 Query Router (`src/api/router.py`) **Purpose:** FastAPI application handling HTTP requests **Endpoints:** ```python POST /research Request: {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]} Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int} GET /health Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool} GET /search (passthrough) Request: {"q": "string", "engines": ["google", "bing"], "page": 1} Response: Raw SearXNG JSON ``` **Validation Rules:** - Query: min 3, max 500 characters - Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis) - Rate limit: 30 req/min per IP --- ### 3.2 SearXNG Client (`src/search/searxng.py`) **Purpose:** Async client for SearXNG instance **Configuration:** ```yaml searxng: base_url: "http://localhost:8080" # or external instance timeout: 10 max_results: 10 engines: default: ["google", "bing", "duckduckgo"] news: ["google_news", "bing_news"] academic: ["google_scholar", "arxiv"] ``` **Interface:** ```python class SearXNGClient: async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult async def search_multi(self, queries: list[str]) -> list[SearchResult] # for deep mode ``` **Caching:** - Cache key: SHA256(query + engines.join(",")) - TTL: 1 hour for identical queries - Storage: In-memory LRU (1000 entries) or Redis --- ### 3.3 Synthesizer (`src/llm/synthesizer.py`) **Purpose:** Transform search results into coherent answers using Kimi for Coding **⚠️ CRITICAL:** Kimi for Coding API requires special `User-Agent: KimiCLI/0.77` header! **API Configuration:** ```python { "base_url": "https://api.kimi.com/coding/v1", "api_key": "sk-kimi-...", # Kimi for Coding API Key "headers": { "User-Agent": "KimiCLI/0.77" # REQUIRED - 403 without this! } } ``` **Prompt Strategy:** ``` You are a research assistant. Synthesize the following search results into a clear, accurate answer. Include citations [1], [2], etc. User Query: {query} Search Results: {formatted_results} Instructions: 1. Answer directly and concisely 2. Cite sources using [1], [2] format 3. If results conflict, note the discrepancy 4. If insufficient data, say so clearly Answer in {language}. ``` **Implementation:** ```python from openai import AsyncOpenAI class Synthesizer: def __init__(self, api_key: str, model: str = "kimi-for-coding"): self.client = AsyncOpenAI( base_url="https://api.kimi.com/coding/v1", api_key=api_key, default_headers={"User-Agent": "KimiCLI/0.77"} # CRITICAL! ) async def synthesize( self, query: str, results: list[SearchResult], max_tokens: int = 2048 ) -> SynthesisResult: response = await self.client.chat.completions.create( model=self.model, messages=[ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": self._format_prompt(query, results)} ], max_tokens=max_tokens ) return SynthesisResult( content=response.choices[0].message.content, sources=self._extract_citations(results) ) ``` **Performance Notes:** - Kimi for Coding optimized for code + reasoning tasks - Truncate search results to ~4000 tokens to stay within context - Cache syntheses for identical result sets --- ### 3.4 Rate Limiter (`src/middleware/ratelimit.py`) **Purpose:** Protect against abuse and control costs **Strategy:** - IP-based: 30 requests/minute - Global: 1000 requests/hour (configurable) - Burst: Allow 5 requests immediately, then token bucket --- ## 4. Data Models (`src/models/`) ### SearchResult ```python class SearchResult(BaseModel): title: str url: str content: str | None # Snippet or full text source: str # Engine name score: float | None published: datetime | None ``` ### ResearchResponse ```python class ResearchResponse(BaseModel): query: str depth: str synthesis: str sources: list[dict] # {title, url, index} raw_results: list[SearchResult] | None # null if omit_raw=true metadata: dict # {latency_ms, cache_hit, tokens_used} ``` ### Config ```python class Config(BaseModel): searxng_url: str kimi_api_key: str # Kimi for Coding API Key cache_backend: Literal["memory", "redis"] = "memory" rate_limit: dict # requests, window ``` --- ## 5. Testing Strategy ### Test Categories | Category | Location | Responsibility | |----------|----------|----------------| | Unit | `tests/unit/` | Individual functions, pure logic | | Integration | `tests/integration/` | Component interactions | | E2E | `tests/e2e/` | Full request flow | | Performance | `tests/perf/` | Load testing | ### Test Isolation Principle **CRITICAL:** Each test category runs independently. No test should require another test to run first. ### 5.1 Unit Tests (`tests/unit/`) **test_synthesizer.py:** - Mock Kimi for Coding API responses - Test prompt formatting - Test User-Agent header injection - Test token counting/truncation - Test error handling (API down, auth errors) **test_searxng_client.py:** - Mock HTTP responses - Test result parsing - Test caching logic - Test timeout handling **test_models.py:** - Pydantic validation - Serialization/deserialization ### 5.2 Integration Tests (`tests/integration/`) **Requires:** Running SearXNG instance (Docker) **test_search_flow.py:** - Real SearXNG queries - Cache interaction - Error propagation **test_api.py:** - FastAPI test client - Request/response validation - Rate limiting behavior ### 5.3 E2E Tests (`tests/e2e/`) **test_research_endpoint.py:** - Full flow: query → search → synthesize → response - Verify citation format - Verify source attribution --- ## 6. Implementation Phases ### Phase 1: Foundation (No LLM yet) ✅ COMPLETE **Goal:** Working search API **Deliverables:** - [x] Project structure with pyproject.toml - [x] SearXNG client with async HTTP - [x] FastAPI router with `/search` endpoint - [x] Basic tests (mocked) - 28 tests, 92% coverage - [x] Docker Compose for SearXNG **Acceptance Criteria:** ```bash curl -X POST http://localhost:8000/search \ -H "Content-Type: application/json" \ -d '{"q": "python asyncio", "engines": ["google"]}' # Returns valid SearXNG results ``` **Status:** ✅ All tests passing, 92% coverage ### Phase 2: Synthesis Layer ✅ COMPLETE **Goal:** Add Kimi for Coding integration **Deliverables:** - [x] Synthesizer class with Kimi for Coding API - [x] `/research` endpoint combining search + synthesis - [x] Prompt templates - [x] Response formatting with citations - [x] User-Agent header handling **Acceptance Criteria:** ```bash curl -X POST http://localhost:8000/research \ -d '{"query": "What is Python asyncio?"}' # Returns synthesized answer with citations ``` **Status:** ✅ Implemented, tested (40 tests, 90% coverage) ### Phase 3: Polish **Goal:** Production readiness **Deliverables:** - [ ] Rate limiting - [ ] Caching (Redis optional) - [ ] Structured logging - [ ] Health checks - [ ] Metrics (Prometheus) - [ ] Documentation --- ## 7. Configuration ### Environment Variables ```bash RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080 RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-... # Kimi for Coding Key RESEARCH_BRIDGE_LOG_LEVEL=INFO RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379 # optional ``` ### Important: Kimi for Coding API Requirements ```python # The API requires a special User-Agent header! headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json", "User-Agent": "KimiCLI/0.77" # ← REQUIRED! 403 without this } ``` ### Docker Compose (SearXNG) ```yaml # config/searxng-docker-compose.yml version: '3' services: searxng: image: searxng/searxng:latest ports: - "8080:8080" volumes: - ./searxng-settings.yml:/etc/searxng/settings.yml ``` --- ## 8. API Contract ### POST /research **Request:** ```json { "query": "latest developments in fusion energy", "depth": "deep", "sources": ["web", "news"], "language": "en", "omit_raw": false } ``` **Response:** ```json { "query": "latest developments in fusion energy", "depth": "deep", "synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]", "sources": [ {"index": 1, "title": "Fusion breakthrough", "url": "https://..."}, {"index": 2, "title": "CFS milestone", "url": "https://..."} ], "raw_results": [...], "metadata": { "latency_ms": 3200, "cache_hit": false, "tokens_used": 1247, "cost_usd": 0.0 } } ``` --- ## 9. Cost Analysis ### Per-Query Costs | Component | Cost | Notes | |-----------|------|-------| | **SearXNG** | **$0.00** | Self-hosted, Open Source, keine API-Kosten | | **Kimi for Coding** | **$0.00** | Via bestehendes Abo (keine zusätzlichen Kosten) | | **Gesamt pro Query** | **$0.00** | | **Vergleich:** | Lösung | Kosten pro Query | Faktor | |--------|------------------|--------| | Perplexity Sonar Pro | ~$0.015-0.03 | ∞ (teurer) | | Perplexity API direkt | ~$0.005 | ∞ (teurer) | | **Research Bridge** | **$0.00** | **Baseline** | **Einsparung: 100%** der laufenden Kosten! ### Warum ist das komplett kostenlos? - **SearXNG:** Gratis (Open Source, self-hosted) - **Kimi for Coding:** Bereits über bestehendes Abo abgedeckt - Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren ### Break-Even Analysis - Einrichtungsaufwand: ~10 Stunden - Bei beliebiger Nutzung: **$0 laufende Kosten** vs. $X mit Perplexity --- ## 10. Success Criteria ### Functional - [ ] `/research` returns synthesized answers in <5s - [ ] Citations link to original sources - [ ] Rate limiting prevents abuse - [ ] Health endpoint confirms all dependencies ### Quality - [ ] Answer quality matches Perplexity in blind test (n=20) - [ ] Citation accuracy >95% - [ ] Handles ambiguous queries gracefully ### Operational - [ ] 99% uptime (excluding planned maintenance) - [ ] <1% error rate - [ ] Logs structured for observability --- ## 11. Risks & Mitigations | Risk | Likelihood | Impact | Mitigation | |------|------------|--------|------------| | SearXNG instance down | Medium | High | Deploy redundant instance, fallback engines | | Kimi for Coding API changes | Low | Medium | Abstract API client, monitor for breaking changes | | User-Agent requirement breaks | Low | High | Hardcoded header, monitor API docs for updates | | Answer quality poor | Medium | High | A/B test prompts, fallback to deeper search | --- ## 12. Future Enhancements - **Follow-up questions:** Context-aware multi-turn research - **Source extraction:** Fetch full article text via crawling - **PDF support:** Search and synthesize academic papers - **Custom prompts:** User-defined synthesis instructions - **Webhook notifications:** Async research with callback --- ## 13. Appendix: Implementation Notes ### Kimi for Coding API Specifics **Required Headers:** ```python headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json", "User-Agent": "KimiCLI/0.77" # ← CRITICAL! 403 without this } ``` **OpenAI-Compatible Client Setup:** ```python from openai import AsyncOpenAI client = AsyncOpenAI( base_url="https://api.kimi.com/coding/v1", api_key=api_key, default_headers={"User-Agent": "KimiCLI/0.77"} ) ``` **Model Name:** `kimi-for-coding` **Prompting Best Practices:** - Works best with clear, structured prompts - Handles long contexts well - Use explicit formatting instructions - Add "Think step by step" for complex synthesis ### SearXNG Tuning - Enable `json` format for structured results - Use `safesearch=0` for unfiltered results - Request `time_range: month` for recent content - Add "Think step by step" for complex synthesis ### SearXNG Tuning - Enable `json` format for structured results - Use `safesearch=0` for unfiltered results - Request `time_range: month` for recent content --- **Document Version:** 1.0 **Last Updated:** 2026-03-14 **Next Review:** Post-Phase-1 implementation