Files
research-bridge/docs/TDD.md

15 KiB

TDD: Research Bridge - SearXNG + Kimi for Coding Integration

AI Council Review Document

Project: research-bridge Purpose: Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding Cost Target: $0 per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo) Architecture: Modular, testable, async-first


1. Executive Summary

Problem

Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly.

Solution

Replace Perplexity with a two-tier architecture:

  1. SearXNG (self-hosted, FREE): Aggregates search results from 70+ sources
  2. Kimi for Coding (via bestehendes Abo, $0): Summarizes and reasons over results

Expected Outcome

  • Cost: $0 per query (vs $0.02-0.05 with Perplexity)
  • Latency: 2-5s per query
  • Quality: Comparable to Perplexity Sonar

2. Architecture Overview

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   User Query    │────▶│  Query Router    │────▶│   SearXNG       │
│                 │     │  (FastAPI)       │     │   (Self-Hosted) │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                                                        │
                                                        ▼
                                               ┌─────────────────┐
                                               │  Search Results │
                                               │  (JSON/Raw)     │
                                               └─────────────────┘
                                                        │
┌─────────────────┐     ┌──────────────────┐           │
│   Response      │◀────│  Kimi for Coding │◀──────────┘
│   (Markdown)    │     │  (Synthesizer)   │
└─────────────────┘     └──────────────────┘

Core Components

Component Responsibility Tech Stack
query-router HTTP API, validation, routing FastAPI, Pydantic
searxng-client Interface to SearXNG instance aiohttp, caching
synthesizer LLM prompts, response formatting Kimi for Coding API
cache-layer Result deduplication Redis (optional)
rate-limiter Prevent abuse slowapi

3. Component Specifications

3.1 Query Router (src/api/router.py)

Purpose: FastAPI application handling HTTP requests

Endpoints:

POST /research
Request:  {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]}
Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int}

GET /health
Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool}

GET /search (passthrough)
Request:  {"q": "string", "engines": ["google", "bing"], "page": 1}
Response: Raw SearXNG JSON

Validation Rules:

  • Query: min 3, max 500 characters
  • Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis)
  • Rate limit: 30 req/min per IP

3.2 SearXNG Client (src/search/searxng.py)

Purpose: Async client for SearXNG instance

Configuration:

searxng:
  base_url: "http://localhost:8080"  # or external instance
  timeout: 10
  max_results: 10
  engines:
    default: ["google", "bing", "duckduckgo"]
    news: ["google_news", "bing_news"]
    academic: ["google_scholar", "arxiv"]

Interface:

class SearXNGClient:
    async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult
    async def search_multi(self, queries: list[str]) -> list[SearchResult]  # for deep mode

Caching:

  • Cache key: SHA256(query + engines.join(","))
  • TTL: 1 hour for identical queries
  • Storage: In-memory LRU (1000 entries) or Redis

3.3 Synthesizer (src/llm/synthesizer.py)

Purpose: Transform search results into coherent answers using Kimi for Coding

⚠️ CRITICAL: Kimi for Coding API requires special User-Agent: KimiCLI/0.77 header!

API Configuration:

{
    "base_url": "https://api.kimi.com/coding/v1",
    "api_key": "sk-kimi-...",  # Kimi for Coding API Key
    "headers": {
        "User-Agent": "KimiCLI/0.77"  # REQUIRED - 403 without this!
    }
}

Prompt Strategy:

You are a research assistant. Synthesize the following search results into a
clear, accurate answer. Include citations [1], [2], etc.

User Query: {query}

Search Results:
{formatted_results}

Instructions:
1. Answer directly and concisely
2. Cite sources using [1], [2] format
3. If results conflict, note the discrepancy
4. If insufficient data, say so clearly

Answer in {language}.

Implementation:

from openai import AsyncOpenAI

class Synthesizer:
    def __init__(self, api_key: str, model: str = "kimi-for-coding"):
        self.client = AsyncOpenAI(
            base_url="https://api.kimi.com/coding/v1",
            api_key=api_key,
            default_headers={"User-Agent": "KimiCLI/0.77"}  # CRITICAL!
        )
    
    async def synthesize(
        self, 
        query: str, 
        results: list[SearchResult],
        max_tokens: int = 2048
    ) -> SynthesisResult:
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": self._format_prompt(query, results)}
            ],
            max_tokens=max_tokens
        )
        return SynthesisResult(
            content=response.choices[0].message.content,
            sources=self._extract_citations(results)
        )

Performance Notes:

  • Kimi for Coding optimized for code + reasoning tasks
  • Truncate search results to ~4000 tokens to stay within context
  • Cache syntheses for identical result sets

3.4 Rate Limiter (src/middleware/ratelimit.py)

Purpose: Protect against abuse and control costs

Strategy:

  • IP-based: 30 requests/minute
  • Global: 1000 requests/hour (configurable)
  • Burst: Allow 5 requests immediately, then token bucket

4. Data Models (src/models/)

SearchResult

class SearchResult(BaseModel):
    title: str
    url: str
    content: str | None  # Snippet or full text
    source: str  # Engine name
    score: float | None
    published: datetime | None

ResearchResponse

class ResearchResponse(BaseModel):
    query: str
    depth: str
    synthesis: str
    sources: list[dict]  # {title, url, index}
    raw_results: list[SearchResult] | None  # null if omit_raw=true
    metadata: dict  # {latency_ms, cache_hit, tokens_used}

Config

class Config(BaseModel):
    searxng_url: str
    kimi_api_key: str  # Kimi for Coding API Key
    cache_backend: Literal["memory", "redis"] = "memory"
    rate_limit: dict  # requests, window

5. Testing Strategy

Test Categories

Category Location Responsibility
Unit tests/unit/ Individual functions, pure logic
Integration tests/integration/ Component interactions
E2E tests/e2e/ Full request flow
Performance tests/perf/ Load testing

Test Isolation Principle

CRITICAL: Each test category runs independently. No test should require another test to run first.

5.1 Unit Tests (tests/unit/)

test_synthesizer.py:

  • Mock Kimi for Coding API responses
  • Test prompt formatting
  • Test User-Agent header injection
  • Test token counting/truncation
  • Test error handling (API down, auth errors)

test_searxng_client.py:

  • Mock HTTP responses
  • Test result parsing
  • Test caching logic
  • Test timeout handling

test_models.py:

  • Pydantic validation
  • Serialization/deserialization

5.2 Integration Tests (tests/integration/)

Requires: Running SearXNG instance (Docker)

test_search_flow.py:

  • Real SearXNG queries
  • Cache interaction
  • Error propagation

test_api.py:

  • FastAPI test client
  • Request/response validation
  • Rate limiting behavior

5.3 E2E Tests (tests/e2e/)

test_research_endpoint.py:

  • Full flow: query → search → synthesize → response
  • Verify citation format
  • Verify source attribution

6. Implementation Phases

Phase 1: Foundation (No LLM yet) COMPLETE

Goal: Working search API Deliverables:

  • Project structure with pyproject.toml
  • SearXNG client with async HTTP
  • FastAPI router with /search endpoint
  • Basic tests (mocked) - 28 tests, 92% coverage
  • Docker Compose for SearXNG

Acceptance Criteria:

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"q": "python asyncio", "engines": ["google"]}'
# Returns valid SearXNG results

Status: All tests passing, 92% coverage

Phase 2: Synthesis Layer COMPLETE

Goal: Add Kimi for Coding integration Deliverables:

  • Synthesizer class with Kimi for Coding API
  • /research endpoint combining search + synthesis
  • Prompt templates
  • Response formatting with citations
  • User-Agent header handling

Acceptance Criteria:

curl -X POST http://localhost:8000/research \
  -d '{"query": "What is Python asyncio?"}'
# Returns synthesized answer with citations

Status: Implemented, tested (40 tests, 90% coverage)

Phase 3: Polish

Goal: Production readiness Deliverables:

  • Rate limiting
  • Caching (Redis optional)
  • Structured logging
  • Health checks
  • Metrics (Prometheus)
  • Documentation

7. Configuration

Environment Variables

RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080
RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-...  # Kimi for Coding Key
RESEARCH_BRIDGE_LOG_LEVEL=INFO
RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379  # optional

Important: Kimi for Coding API Requirements

# The API requires a special User-Agent header!
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "User-Agent": "KimiCLI/0.77"  # ← REQUIRED! 403 without this
}

Docker Compose (SearXNG)

# config/searxng-docker-compose.yml
version: '3'
services:
  searxng:
    image: searxng/searxng:latest
    ports:
      - "8080:8080"
    volumes:
      - ./searxng-settings.yml:/etc/searxng/settings.yml

8. API Contract

POST /research

Request:

{
  "query": "latest developments in fusion energy",
  "depth": "deep",
  "sources": ["web", "news"],
  "language": "en",
  "omit_raw": false
}

Response:

{
  "query": "latest developments in fusion energy",
  "depth": "deep",
  "synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]",
  "sources": [
    {"index": 1, "title": "Fusion breakthrough", "url": "https://..."},
    {"index": 2, "title": "CFS milestone", "url": "https://..."}
  ],
  "raw_results": [...],
  "metadata": {
    "latency_ms": 3200,
    "cache_hit": false,
    "tokens_used": 1247,
    "cost_usd": 0.0
  }
}

9. Cost Analysis

Per-Query Costs

Component Cost Notes
SearXNG $0.00 Self-hosted, Open Source, keine API-Kosten
Kimi for Coding $0.00 Via bestehendes Abo (keine zusätzlichen Kosten)
Gesamt pro Query $0.00

Vergleich:

Lösung Kosten pro Query Faktor
Perplexity Sonar Pro ~$0.015-0.03 ∞ (teurer)
Perplexity API direkt ~$0.005 ∞ (teurer)
Research Bridge $0.00 Baseline

Einsparung: 100% der laufenden Kosten!

Warum ist das komplett kostenlos?

  • SearXNG: Gratis (Open Source, self-hosted)
  • Kimi for Coding: Bereits über bestehendes Abo abgedeckt
  • Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren

Break-Even Analysis

  • Einrichtungsaufwand: ~10 Stunden
  • Bei beliebiger Nutzung: $0 laufende Kosten vs. $X mit Perplexity

10. Success Criteria

Functional

  • /research returns synthesized answers in <5s
  • Citations link to original sources
  • Rate limiting prevents abuse
  • Health endpoint confirms all dependencies

Quality

  • Answer quality matches Perplexity in blind test (n=20)
  • Citation accuracy >95%
  • Handles ambiguous queries gracefully

Operational

  • 99% uptime (excluding planned maintenance)
  • <1% error rate
  • Logs structured for observability

11. Risks & Mitigations

Risk Likelihood Impact Mitigation
SearXNG instance down Medium High Deploy redundant instance, fallback engines
Kimi for Coding API changes Low Medium Abstract API client, monitor for breaking changes
User-Agent requirement breaks Low High Hardcoded header, monitor API docs for updates
Answer quality poor Medium High A/B test prompts, fallback to deeper search

12. Future Enhancements

  • Follow-up questions: Context-aware multi-turn research
  • Source extraction: Fetch full article text via crawling
  • PDF support: Search and synthesize academic papers
  • Custom prompts: User-defined synthesis instructions
  • Webhook notifications: Async research with callback

13. Appendix: Implementation Notes

Kimi for Coding API Specifics

Required Headers:

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "User-Agent": "KimiCLI/0.77"  # ← CRITICAL! 403 without this
}

OpenAI-Compatible Client Setup:

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.kimi.com/coding/v1",
    api_key=api_key,
    default_headers={"User-Agent": "KimiCLI/0.77"}
)

Model Name: kimi-for-coding

Prompting Best Practices:

  • Works best with clear, structured prompts
  • Handles long contexts well
  • Use explicit formatting instructions
  • Add "Think step by step" for complex synthesis

SearXNG Tuning

  • Enable json format for structured results
  • Use safesearch=0 for unfiltered results
  • Request time_range: month for recent content
  • Add "Think step by step" for complex synthesis

SearXNG Tuning

  • Enable json format for structured results
  • Use safesearch=0 for unfiltered results
  • Request time_range: month for recent content

Document Version: 1.0 Last Updated: 2026-03-14 Next Review: Post-Phase-1 implementation