Files

Henry 1130305e71 Initial commit: Research Bridge API with Podman support

2026-03-14 12:45:36 +00:00

15 KiB

Raw Blame History

TDD: Research Bridge - SearXNG + Kimi for Coding Integration

AI Council Review Document

Project: research-bridge Purpose: Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding Cost Target: $0 per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo) Architecture: Modular, testable, async-first

1. Executive Summary

Problem

Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly.

Solution

Replace Perplexity with a two-tier architecture:

SearXNG (self-hosted, FREE): Aggregates search results from 70+ sources
Kimi for Coding (via bestehendes Abo, $0): Summarizes and reasons over results

Expected Outcome

Cost: $0 per query (vs $0.02-0.05 with Perplexity)
Latency: 2-5s per query
Quality: Comparable to Perplexity Sonar

2. Architecture Overview

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│   User Query    │────▶│  Query Router    │────▶│   SearXNG       │
│                 │     │  (FastAPI)       │     │   (Self-Hosted) │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                                                        │
                                                        ▼
                                               ┌─────────────────┐
                                               │  Search Results │
                                               │  (JSON/Raw)     │
                                               └─────────────────┘
                                                        │
┌─────────────────┐     ┌──────────────────┐           │
│   Response      │◀────│  Kimi for Coding │◀──────────┘
│   (Markdown)    │     │  (Synthesizer)   │
└─────────────────┘     └──────────────────┘

Core Components

Component	Responsibility	Tech Stack
`query-router`	HTTP API, validation, routing	FastAPI, Pydantic
`searxng-client`	Interface to SearXNG instance	aiohttp, caching
`synthesizer`	LLM prompts, response formatting	Kimi for Coding API
`cache-layer`	Result deduplication	Redis (optional)
`rate-limiter`	Prevent abuse	slowapi

3. Component Specifications

3.1 Query Router (`src/api/router.py`)

Purpose: FastAPI application handling HTTP requests

Endpoints:

POST /research
Request:  {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]}
Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int}

GET /health
Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool}

GET /search (passthrough)
Request:  {"q": "string", "engines": ["google", "bing"], "page": 1}
Response: Raw SearXNG JSON

Validation Rules:

Query: min 3, max 500 characters
Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis)
Rate limit: 30 req/min per IP

3.2 SearXNG Client (`src/search/searxng.py`)

Purpose: Async client for SearXNG instance

Configuration:

searxng:
  base_url: "http://localhost:8080"  # or external instance
  timeout: 10
  max_results: 10
  engines:
    default: ["google", "bing", "duckduckgo"]
    news: ["google_news", "bing_news"]
    academic: ["google_scholar", "arxiv"]

Interface:

class SearXNGClient:
    async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult
    async def search_multi(self, queries: list[str]) -> list[SearchResult]  # for deep mode

Caching:

Cache key: SHA256(query + engines.join(","))
TTL: 1 hour for identical queries
Storage: In-memory LRU (1000 entries) or Redis

3.3 Synthesizer (`src/llm/synthesizer.py`)

Purpose: Transform search results into coherent answers using Kimi for Coding

⚠️ CRITICAL: Kimi for Coding API requires special User-Agent: KimiCLI/0.77 header!

API Configuration:

{
    "base_url": "https://api.kimi.com/coding/v1",
    "api_key": "sk-kimi-...",  # Kimi for Coding API Key
    "headers": {
        "User-Agent": "KimiCLI/0.77"  # REQUIRED - 403 without this!
    }
}

Prompt Strategy:

You are a research assistant. Synthesize the following search results into a
clear, accurate answer. Include citations [1], [2], etc.

User Query: {query}

Search Results:
{formatted_results}

Instructions:
1. Answer directly and concisely
2. Cite sources using [1], [2] format
3. If results conflict, note the discrepancy
4. If insufficient data, say so clearly

Answer in {language}.

Implementation:

from openai import AsyncOpenAI

class Synthesizer:
    def __init__(self, api_key: str, model: str = "kimi-for-coding"):
        self.client = AsyncOpenAI(
            base_url="https://api.kimi.com/coding/v1",
            api_key=api_key,
            default_headers={"User-Agent": "KimiCLI/0.77"}  # CRITICAL!
        )
    
    async def synthesize(
        self, 
        query: str, 
        results: list[SearchResult],
        max_tokens: int = 2048
    ) -> SynthesisResult:
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": SYSTEM_PROMPT},
                {"role": "user", "content": self._format_prompt(query, results)}
            ],
            max_tokens=max_tokens
        )
        return SynthesisResult(
            content=response.choices[0].message.content,
            sources=self._extract_citations(results)
        )

Performance Notes:

Kimi for Coding optimized for code + reasoning tasks
Truncate search results to ~4000 tokens to stay within context
Cache syntheses for identical result sets

3.4 Rate Limiter (`src/middleware/ratelimit.py`)

Purpose: Protect against abuse and control costs

Strategy:

IP-based: 30 requests/minute
Global: 1000 requests/hour (configurable)
Burst: Allow 5 requests immediately, then token bucket

4. Data Models (`src/models/`)

SearchResult

class SearchResult(BaseModel):
    title: str
    url: str
    content: str | None  # Snippet or full text
    source: str  # Engine name
    score: float | None
    published: datetime | None

ResearchResponse

class ResearchResponse(BaseModel):
    query: str
    depth: str
    synthesis: str
    sources: list[dict]  # {title, url, index}
    raw_results: list[SearchResult] | None  # null if omit_raw=true
    metadata: dict  # {latency_ms, cache_hit, tokens_used}

Config

class Config(BaseModel):
    searxng_url: str
    kimi_api_key: str  # Kimi for Coding API Key
    cache_backend: Literal["memory", "redis"] = "memory"
    rate_limit: dict  # requests, window

5. Testing Strategy

Test Categories

Category	Location	Responsibility
Unit	`tests/unit/`	Individual functions, pure logic
Integration	`tests/integration/`	Component interactions
E2E	`tests/e2e/`	Full request flow
Performance	`tests/perf/`	Load testing

Test Isolation Principle

CRITICAL: Each test category runs independently. No test should require another test to run first.

5.1 Unit Tests (`tests/unit/`)

test_synthesizer.py:

Mock Kimi for Coding API responses
Test prompt formatting
Test User-Agent header injection
Test token counting/truncation
Test error handling (API down, auth errors)

test_searxng_client.py:

Mock HTTP responses
Test result parsing
Test caching logic
Test timeout handling

test_models.py:

Pydantic validation
Serialization/deserialization

5.2 Integration Tests (`tests/integration/`)

Requires: Running SearXNG instance (Docker)

test_search_flow.py:

Real SearXNG queries
Cache interaction
Error propagation

test_api.py:

FastAPI test client
Request/response validation
Rate limiting behavior

5.3 E2E Tests (`tests/e2e/`)

test_research_endpoint.py:

Full flow: query → search → synthesize → response
Verify citation format
Verify source attribution

6. Implementation Phases

Phase 1: Foundation (No LLM yet) ✅ COMPLETE

Goal: Working search API Deliverables:

Project structure with pyproject.toml
SearXNG client with async HTTP
FastAPI router with /search endpoint
Basic tests (mocked) - 28 tests, 92% coverage
Docker Compose for SearXNG

Acceptance Criteria:

curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"q": "python asyncio", "engines": ["google"]}'
# Returns valid SearXNG results

Status: ✅ All tests passing, 92% coverage

Phase 2: Synthesis Layer ✅ COMPLETE

Goal: Add Kimi for Coding integration Deliverables:

Synthesizer class with Kimi for Coding API
/research endpoint combining search + synthesis
Prompt templates
Response formatting with citations
User-Agent header handling

Acceptance Criteria:

curl -X POST http://localhost:8000/research \
  -d '{"query": "What is Python asyncio?"}'
# Returns synthesized answer with citations

Status: ✅ Implemented, tested (40 tests, 90% coverage)

Phase 3: Polish

Goal: Production readiness Deliverables:

Rate limiting
Caching (Redis optional)
Structured logging
Health checks
Metrics (Prometheus)
Documentation

7. Configuration

Environment Variables

RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080
RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-...  # Kimi for Coding Key
RESEARCH_BRIDGE_LOG_LEVEL=INFO
RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379  # optional

Important: Kimi for Coding API Requirements

# The API requires a special User-Agent header!
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "User-Agent": "KimiCLI/0.77"  # ← REQUIRED! 403 without this
}

Docker Compose (SearXNG)

# config/searxng-docker-compose.yml
version: '3'
services:
  searxng:
    image: searxng/searxng:latest
    ports:
      - "8080:8080"
    volumes:
      - ./searxng-settings.yml:/etc/searxng/settings.yml

8. API Contract

POST /research

Request:

{
  "query": "latest developments in fusion energy",
  "depth": "deep",
  "sources": ["web", "news"],
  "language": "en",
  "omit_raw": false
}

Response:

{
  "query": "latest developments in fusion energy",
  "depth": "deep",
  "synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]",
  "sources": [
    {"index": 1, "title": "Fusion breakthrough", "url": "https://..."},
    {"index": 2, "title": "CFS milestone", "url": "https://..."}
  ],
  "raw_results": [...],
  "metadata": {
    "latency_ms": 3200,
    "cache_hit": false,
    "tokens_used": 1247,
    "cost_usd": 0.0
  }
}

9. Cost Analysis

Per-Query Costs

Component	Cost	Notes
SearXNG	$0.00	Self-hosted, Open Source, keine API-Kosten
Kimi for Coding	$0.00	Via bestehendes Abo (keine zusätzlichen Kosten)
Gesamt pro Query	$0.00

Vergleich:

Lösung	Kosten pro Query	Faktor
Perplexity Sonar Pro	~$0.015-0.03	∞ (teurer)
Perplexity API direkt	~$0.005	∞ (teurer)
Research Bridge	$0.00	Baseline

Einsparung: 100% der laufenden Kosten!

Warum ist das komplett kostenlos?

SearXNG: Gratis (Open Source, self-hosted)
Kimi for Coding: Bereits über bestehendes Abo abgedeckt
Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren

Break-Even Analysis

Einrichtungsaufwand: ~10 Stunden
Bei beliebiger Nutzung: $0 laufende Kosten vs. $X mit Perplexity

10. Success Criteria

Functional

/research returns synthesized answers in <5s
Citations link to original sources
Rate limiting prevents abuse
Health endpoint confirms all dependencies

Quality

Answer quality matches Perplexity in blind test (n=20)
Citation accuracy >95%
Handles ambiguous queries gracefully

Operational

99% uptime (excluding planned maintenance)
<1% error rate
Logs structured for observability

11. Risks & Mitigations

Risk	Likelihood	Impact	Mitigation
SearXNG instance down	Medium	High	Deploy redundant instance, fallback engines
Kimi for Coding API changes	Low	Medium	Abstract API client, monitor for breaking changes
User-Agent requirement breaks	Low	High	Hardcoded header, monitor API docs for updates
Answer quality poor	Medium	High	A/B test prompts, fallback to deeper search

12. Future Enhancements

Follow-up questions: Context-aware multi-turn research
Source extraction: Fetch full article text via crawling
PDF support: Search and synthesize academic papers
Custom prompts: User-defined synthesis instructions
Webhook notifications: Async research with callback

13. Appendix: Implementation Notes

Kimi for Coding API Specifics

Required Headers:

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json",
    "User-Agent": "KimiCLI/0.77"  # ← CRITICAL! 403 without this
}

OpenAI-Compatible Client Setup:

from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://api.kimi.com/coding/v1",
    api_key=api_key,
    default_headers={"User-Agent": "KimiCLI/0.77"}
)

Model Name: kimi-for-coding

Prompting Best Practices:

Works best with clear, structured prompts
Handles long contexts well
Use explicit formatting instructions
Add "Think step by step" for complex synthesis

SearXNG Tuning

Enable json format for structured results
Use safesearch=0 for unfiltered results
Request time_range: month for recent content
Add "Think step by step" for complex synthesis

SearXNG Tuning

Enable json format for structured results
Use safesearch=0 for unfiltered results
Request time_range: month for recent content

Document Version: 1.0 Last Updated: 2026-03-14 Next Review: Post-Phase-1 implementation

15 KiB Raw Blame History

TDD: Research Bridge - SearXNG + Kimi for Coding Integration

AI Council Review Document

1. Executive Summary

Problem

Solution

Expected Outcome

2. Architecture Overview

Core Components

3. Component Specifications

3.1 Query Router (src/api/router.py)

3.2 SearXNG Client (src/search/searxng.py)

3.3 Synthesizer (src/llm/synthesizer.py)

3.4 Rate Limiter (src/middleware/ratelimit.py)

4. Data Models (src/models/)

SearchResult

ResearchResponse

Config

5. Testing Strategy

Test Categories

Test Isolation Principle

5.1 Unit Tests (tests/unit/)

5.2 Integration Tests (tests/integration/)

5.3 E2E Tests (tests/e2e/)

6. Implementation Phases

Phase 1: Foundation (No LLM yet) ✅ COMPLETE

Phase 2: Synthesis Layer ✅ COMPLETE

Phase 3: Polish

7. Configuration

Environment Variables

Important: Kimi for Coding API Requirements

Docker Compose (SearXNG)

8. API Contract

POST /research

9. Cost Analysis

Per-Query Costs

Warum ist das komplett kostenlos?

Break-Even Analysis

10. Success Criteria

Functional

Quality

Operational

11. Risks & Mitigations

12. Future Enhancements

13. Appendix: Implementation Notes

Kimi for Coding API Specifics

SearXNG Tuning

SearXNG Tuning

15 KiB

Raw Blame History

3.1 Query Router (`src/api/router.py`)

3.2 SearXNG Client (`src/search/searxng.py`)

3.3 Synthesizer (`src/llm/synthesizer.py`)

3.4 Rate Limiter (`src/middleware/ratelimit.py`)

4. Data Models (`src/models/`)

5.1 Unit Tests (`tests/unit/`)

5.2 Integration Tests (`tests/integration/`)

5.3 E2E Tests (`tests/e2e/`)