15 KiB
TDD: Research Bridge - SearXNG + Kimi for Coding Integration
AI Council Review Document
Project: research-bridge Purpose: Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding Cost Target: $0 per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo) Architecture: Modular, testable, async-first
1. Executive Summary
Problem
Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly.
Solution
Replace Perplexity with a two-tier architecture:
- SearXNG (self-hosted, FREE): Aggregates search results from 70+ sources
- Kimi for Coding (via bestehendes Abo, $0): Summarizes and reasons over results
Expected Outcome
- Cost: $0 per query (vs $0.02-0.05 with Perplexity)
- Latency: 2-5s per query
- Quality: Comparable to Perplexity Sonar
2. Architecture Overview
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ User Query │────▶│ Query Router │────▶│ SearXNG │
│ │ │ (FastAPI) │ │ (Self-Hosted) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Search Results │
│ (JSON/Raw) │
└─────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ Response │◀────│ Kimi for Coding │◀──────────┘
│ (Markdown) │ │ (Synthesizer) │
└─────────────────┘ └──────────────────┘
Core Components
| Component | Responsibility | Tech Stack |
|---|---|---|
query-router |
HTTP API, validation, routing | FastAPI, Pydantic |
searxng-client |
Interface to SearXNG instance | aiohttp, caching |
synthesizer |
LLM prompts, response formatting | Kimi for Coding API |
cache-layer |
Result deduplication | Redis (optional) |
rate-limiter |
Prevent abuse | slowapi |
3. Component Specifications
3.1 Query Router (src/api/router.py)
Purpose: FastAPI application handling HTTP requests
Endpoints:
POST /research
Request: {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]}
Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int}
GET /health
Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool}
GET /search (passthrough)
Request: {"q": "string", "engines": ["google", "bing"], "page": 1}
Response: Raw SearXNG JSON
Validation Rules:
- Query: min 3, max 500 characters
- Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis)
- Rate limit: 30 req/min per IP
3.2 SearXNG Client (src/search/searxng.py)
Purpose: Async client for SearXNG instance
Configuration:
searxng:
base_url: "http://localhost:8080" # or external instance
timeout: 10
max_results: 10
engines:
default: ["google", "bing", "duckduckgo"]
news: ["google_news", "bing_news"]
academic: ["google_scholar", "arxiv"]
Interface:
class SearXNGClient:
async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult
async def search_multi(self, queries: list[str]) -> list[SearchResult] # for deep mode
Caching:
- Cache key: SHA256(query + engines.join(","))
- TTL: 1 hour for identical queries
- Storage: In-memory LRU (1000 entries) or Redis
3.3 Synthesizer (src/llm/synthesizer.py)
Purpose: Transform search results into coherent answers using Kimi for Coding
⚠️ CRITICAL: Kimi for Coding API requires special User-Agent: KimiCLI/0.77 header!
API Configuration:
{
"base_url": "https://api.kimi.com/coding/v1",
"api_key": "sk-kimi-...", # Kimi for Coding API Key
"headers": {
"User-Agent": "KimiCLI/0.77" # REQUIRED - 403 without this!
}
}
Prompt Strategy:
You are a research assistant. Synthesize the following search results into a
clear, accurate answer. Include citations [1], [2], etc.
User Query: {query}
Search Results:
{formatted_results}
Instructions:
1. Answer directly and concisely
2. Cite sources using [1], [2] format
3. If results conflict, note the discrepancy
4. If insufficient data, say so clearly
Answer in {language}.
Implementation:
from openai import AsyncOpenAI
class Synthesizer:
def __init__(self, api_key: str, model: str = "kimi-for-coding"):
self.client = AsyncOpenAI(
base_url="https://api.kimi.com/coding/v1",
api_key=api_key,
default_headers={"User-Agent": "KimiCLI/0.77"} # CRITICAL!
)
async def synthesize(
self,
query: str,
results: list[SearchResult],
max_tokens: int = 2048
) -> SynthesisResult:
response = await self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": self._format_prompt(query, results)}
],
max_tokens=max_tokens
)
return SynthesisResult(
content=response.choices[0].message.content,
sources=self._extract_citations(results)
)
Performance Notes:
- Kimi for Coding optimized for code + reasoning tasks
- Truncate search results to ~4000 tokens to stay within context
- Cache syntheses for identical result sets
3.4 Rate Limiter (src/middleware/ratelimit.py)
Purpose: Protect against abuse and control costs
Strategy:
- IP-based: 30 requests/minute
- Global: 1000 requests/hour (configurable)
- Burst: Allow 5 requests immediately, then token bucket
4. Data Models (src/models/)
SearchResult
class SearchResult(BaseModel):
title: str
url: str
content: str | None # Snippet or full text
source: str # Engine name
score: float | None
published: datetime | None
ResearchResponse
class ResearchResponse(BaseModel):
query: str
depth: str
synthesis: str
sources: list[dict] # {title, url, index}
raw_results: list[SearchResult] | None # null if omit_raw=true
metadata: dict # {latency_ms, cache_hit, tokens_used}
Config
class Config(BaseModel):
searxng_url: str
kimi_api_key: str # Kimi for Coding API Key
cache_backend: Literal["memory", "redis"] = "memory"
rate_limit: dict # requests, window
5. Testing Strategy
Test Categories
| Category | Location | Responsibility |
|---|---|---|
| Unit | tests/unit/ |
Individual functions, pure logic |
| Integration | tests/integration/ |
Component interactions |
| E2E | tests/e2e/ |
Full request flow |
| Performance | tests/perf/ |
Load testing |
Test Isolation Principle
CRITICAL: Each test category runs independently. No test should require another test to run first.
5.1 Unit Tests (tests/unit/)
test_synthesizer.py:
- Mock Kimi for Coding API responses
- Test prompt formatting
- Test User-Agent header injection
- Test token counting/truncation
- Test error handling (API down, auth errors)
test_searxng_client.py:
- Mock HTTP responses
- Test result parsing
- Test caching logic
- Test timeout handling
test_models.py:
- Pydantic validation
- Serialization/deserialization
5.2 Integration Tests (tests/integration/)
Requires: Running SearXNG instance (Docker)
test_search_flow.py:
- Real SearXNG queries
- Cache interaction
- Error propagation
test_api.py:
- FastAPI test client
- Request/response validation
- Rate limiting behavior
5.3 E2E Tests (tests/e2e/)
test_research_endpoint.py:
- Full flow: query → search → synthesize → response
- Verify citation format
- Verify source attribution
6. Implementation Phases
Phase 1: Foundation (No LLM yet) ✅ COMPLETE
Goal: Working search API Deliverables:
- Project structure with pyproject.toml
- SearXNG client with async HTTP
- FastAPI router with
/searchendpoint - Basic tests (mocked) - 28 tests, 92% coverage
- Docker Compose for SearXNG
Acceptance Criteria:
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{"q": "python asyncio", "engines": ["google"]}'
# Returns valid SearXNG results
Status: ✅ All tests passing, 92% coverage
Phase 2: Synthesis Layer ✅ COMPLETE
Goal: Add Kimi for Coding integration Deliverables:
- Synthesizer class with Kimi for Coding API
/researchendpoint combining search + synthesis- Prompt templates
- Response formatting with citations
- User-Agent header handling
Acceptance Criteria:
curl -X POST http://localhost:8000/research \
-d '{"query": "What is Python asyncio?"}'
# Returns synthesized answer with citations
Status: ✅ Implemented, tested (40 tests, 90% coverage)
Phase 3: Polish
Goal: Production readiness Deliverables:
- Rate limiting
- Caching (Redis optional)
- Structured logging
- Health checks
- Metrics (Prometheus)
- Documentation
7. Configuration
Environment Variables
RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080
RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-... # Kimi for Coding Key
RESEARCH_BRIDGE_LOG_LEVEL=INFO
RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379 # optional
Important: Kimi for Coding API Requirements
# The API requires a special User-Agent header!
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": "KimiCLI/0.77" # ← REQUIRED! 403 without this
}
Docker Compose (SearXNG)
# config/searxng-docker-compose.yml
version: '3'
services:
searxng:
image: searxng/searxng:latest
ports:
- "8080:8080"
volumes:
- ./searxng-settings.yml:/etc/searxng/settings.yml
8. API Contract
POST /research
Request:
{
"query": "latest developments in fusion energy",
"depth": "deep",
"sources": ["web", "news"],
"language": "en",
"omit_raw": false
}
Response:
{
"query": "latest developments in fusion energy",
"depth": "deep",
"synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]",
"sources": [
{"index": 1, "title": "Fusion breakthrough", "url": "https://..."},
{"index": 2, "title": "CFS milestone", "url": "https://..."}
],
"raw_results": [...],
"metadata": {
"latency_ms": 3200,
"cache_hit": false,
"tokens_used": 1247,
"cost_usd": 0.0
}
}
9. Cost Analysis
Per-Query Costs
| Component | Cost | Notes |
|---|---|---|
| SearXNG | $0.00 | Self-hosted, Open Source, keine API-Kosten |
| Kimi for Coding | $0.00 | Via bestehendes Abo (keine zusätzlichen Kosten) |
| Gesamt pro Query | $0.00 |
Vergleich:
| Lösung | Kosten pro Query | Faktor |
|---|---|---|
| Perplexity Sonar Pro | ~$0.015-0.03 | ∞ (teurer) |
| Perplexity API direkt | ~$0.005 | ∞ (teurer) |
| Research Bridge | $0.00 | Baseline |
Einsparung: 100% der laufenden Kosten!
Warum ist das komplett kostenlos?
- SearXNG: Gratis (Open Source, self-hosted)
- Kimi for Coding: Bereits über bestehendes Abo abgedeckt
- Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren
Break-Even Analysis
- Einrichtungsaufwand: ~10 Stunden
- Bei beliebiger Nutzung: $0 laufende Kosten vs. $X mit Perplexity
10. Success Criteria
Functional
/researchreturns synthesized answers in <5s- Citations link to original sources
- Rate limiting prevents abuse
- Health endpoint confirms all dependencies
Quality
- Answer quality matches Perplexity in blind test (n=20)
- Citation accuracy >95%
- Handles ambiguous queries gracefully
Operational
- 99% uptime (excluding planned maintenance)
- <1% error rate
- Logs structured for observability
11. Risks & Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| SearXNG instance down | Medium | High | Deploy redundant instance, fallback engines |
| Kimi for Coding API changes | Low | Medium | Abstract API client, monitor for breaking changes |
| User-Agent requirement breaks | Low | High | Hardcoded header, monitor API docs for updates |
| Answer quality poor | Medium | High | A/B test prompts, fallback to deeper search |
12. Future Enhancements
- Follow-up questions: Context-aware multi-turn research
- Source extraction: Fetch full article text via crawling
- PDF support: Search and synthesize academic papers
- Custom prompts: User-defined synthesis instructions
- Webhook notifications: Async research with callback
13. Appendix: Implementation Notes
Kimi for Coding API Specifics
Required Headers:
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"User-Agent": "KimiCLI/0.77" # ← CRITICAL! 403 without this
}
OpenAI-Compatible Client Setup:
from openai import AsyncOpenAI
client = AsyncOpenAI(
base_url="https://api.kimi.com/coding/v1",
api_key=api_key,
default_headers={"User-Agent": "KimiCLI/0.77"}
)
Model Name: kimi-for-coding
Prompting Best Practices:
- Works best with clear, structured prompts
- Handles long contexts well
- Use explicit formatting instructions
- Add "Think step by step" for complex synthesis
SearXNG Tuning
- Enable
jsonformat for structured results - Use
safesearch=0for unfiltered results - Request
time_range: monthfor recent content - Add "Think step by step" for complex synthesis
SearXNG Tuning
- Enable
jsonformat for structured results - Use
safesearch=0for unfiltered results - Request
time_range: monthfor recent content
Document Version: 1.0 Last Updated: 2026-03-14 Next Review: Post-Phase-1 implementation