536 lines
15 KiB
Markdown
536 lines
15 KiB
Markdown
# TDD: Research Bridge - SearXNG + Kimi for Coding Integration
|
|
|
|
## AI Council Review Document
|
|
**Project:** research-bridge
|
|
**Purpose:** Self-hosted research pipeline combining SearXNG meta-search with Kimi for Coding
|
|
**Cost Target:** **$0** per query (SearXNG: $0 self-hosted + Kimi for Coding: via bestehendes Abo)
|
|
**Architecture:** Modular, testable, async-first
|
|
|
|
---
|
|
|
|
## 1. Executive Summary
|
|
|
|
### Problem
|
|
Perplexity API calls cost $0.015-0.03 per query. For frequent research tasks, this adds up quickly.
|
|
|
|
### Solution
|
|
Replace Perplexity with a two-tier architecture:
|
|
1. **SearXNG** (self-hosted, **FREE**): Aggregates search results from 70+ sources
|
|
2. **Kimi for Coding** (via **bestehendes Abo**, **$0**): Summarizes and reasons over results
|
|
|
|
### Expected Outcome
|
|
- **Cost:** **$0 per query** (vs $0.02-0.05 with Perplexity)
|
|
- **Latency:** 2-5s per query
|
|
- **Quality:** Comparable to Perplexity Sonar
|
|
|
|
---
|
|
|
|
## 2. Architecture Overview
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ User Query │────▶│ Query Router │────▶│ SearXNG │
|
|
│ │ │ (FastAPI) │ │ (Self-Hosted) │
|
|
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Search Results │
|
|
│ (JSON/Raw) │
|
|
└─────────────────┘
|
|
│
|
|
┌─────────────────┐ ┌──────────────────┐ │
|
|
│ Response │◀────│ Kimi for Coding │◀──────────┘
|
|
│ (Markdown) │ │ (Synthesizer) │
|
|
└─────────────────┘ └──────────────────┘
|
|
```
|
|
|
|
### Core Components
|
|
|
|
| Component | Responsibility | Tech Stack |
|
|
|-----------|---------------|------------|
|
|
| `query-router` | HTTP API, validation, routing | FastAPI, Pydantic |
|
|
| `searxng-client` | Interface to SearXNG instance | aiohttp, caching |
|
|
| `synthesizer` | LLM prompts, response formatting | Kimi for Coding API |
|
|
| `cache-layer` | Result deduplication | Redis (optional) |
|
|
| `rate-limiter` | Prevent abuse | slowapi |
|
|
|
|
---
|
|
|
|
## 3. Component Specifications
|
|
|
|
### 3.1 Query Router (`src/api/router.py`)
|
|
|
|
**Purpose:** FastAPI application handling HTTP requests
|
|
|
|
**Endpoints:**
|
|
```python
|
|
POST /research
|
|
Request: {"query": "string", "depth": "shallow|deep", "sources": ["web", "news", "academic"]}
|
|
Response: {"query": "string", "results": [...], "synthesis": "string", "sources": [...], "latency_ms": int}
|
|
|
|
GET /health
|
|
Response: {"status": "healthy", "searxng_connected": bool, "kimi_coding_available": bool}
|
|
|
|
GET /search (passthrough)
|
|
Request: {"q": "string", "engines": ["google", "bing"], "page": 1}
|
|
Response: Raw SearXNG JSON
|
|
```
|
|
|
|
**Validation Rules:**
|
|
- Query: min 3, max 500 characters
|
|
- Depth: default "shallow" (1 search) vs "deep" (3 searches + synthesis)
|
|
- Rate limit: 30 req/min per IP
|
|
|
|
---
|
|
|
|
### 3.2 SearXNG Client (`src/search/searxng.py`)
|
|
|
|
**Purpose:** Async client for SearXNG instance
|
|
|
|
**Configuration:**
|
|
```yaml
|
|
searxng:
|
|
base_url: "http://localhost:8080" # or external instance
|
|
timeout: 10
|
|
max_results: 10
|
|
engines:
|
|
default: ["google", "bing", "duckduckgo"]
|
|
news: ["google_news", "bing_news"]
|
|
academic: ["google_scholar", "arxiv"]
|
|
```
|
|
|
|
**Interface:**
|
|
```python
|
|
class SearXNGClient:
|
|
async def search(self, query: str, engines: list[str], page: int = 1) -> SearchResult
|
|
async def search_multi(self, queries: list[str]) -> list[SearchResult] # for deep mode
|
|
```
|
|
|
|
**Caching:**
|
|
- Cache key: SHA256(query + engines.join(","))
|
|
- TTL: 1 hour for identical queries
|
|
- Storage: In-memory LRU (1000 entries) or Redis
|
|
|
|
---
|
|
|
|
### 3.3 Synthesizer (`src/llm/synthesizer.py`)
|
|
|
|
**Purpose:** Transform search results into coherent answers using Kimi for Coding
|
|
|
|
**⚠️ CRITICAL:** Kimi for Coding API requires special `User-Agent: KimiCLI/0.77` header!
|
|
|
|
**API Configuration:**
|
|
```python
|
|
{
|
|
"base_url": "https://api.kimi.com/coding/v1",
|
|
"api_key": "sk-kimi-...", # Kimi for Coding API Key
|
|
"headers": {
|
|
"User-Agent": "KimiCLI/0.77" # REQUIRED - 403 without this!
|
|
}
|
|
}
|
|
```
|
|
|
|
**Prompt Strategy:**
|
|
```
|
|
You are a research assistant. Synthesize the following search results into a
|
|
clear, accurate answer. Include citations [1], [2], etc.
|
|
|
|
User Query: {query}
|
|
|
|
Search Results:
|
|
{formatted_results}
|
|
|
|
Instructions:
|
|
1. Answer directly and concisely
|
|
2. Cite sources using [1], [2] format
|
|
3. If results conflict, note the discrepancy
|
|
4. If insufficient data, say so clearly
|
|
|
|
Answer in {language}.
|
|
```
|
|
|
|
**Implementation:**
|
|
```python
|
|
from openai import AsyncOpenAI
|
|
|
|
class Synthesizer:
|
|
def __init__(self, api_key: str, model: str = "kimi-for-coding"):
|
|
self.client = AsyncOpenAI(
|
|
base_url="https://api.kimi.com/coding/v1",
|
|
api_key=api_key,
|
|
default_headers={"User-Agent": "KimiCLI/0.77"} # CRITICAL!
|
|
)
|
|
|
|
async def synthesize(
|
|
self,
|
|
query: str,
|
|
results: list[SearchResult],
|
|
max_tokens: int = 2048
|
|
) -> SynthesisResult:
|
|
response = await self.client.chat.completions.create(
|
|
model=self.model,
|
|
messages=[
|
|
{"role": "system", "content": SYSTEM_PROMPT},
|
|
{"role": "user", "content": self._format_prompt(query, results)}
|
|
],
|
|
max_tokens=max_tokens
|
|
)
|
|
return SynthesisResult(
|
|
content=response.choices[0].message.content,
|
|
sources=self._extract_citations(results)
|
|
)
|
|
```
|
|
|
|
**Performance Notes:**
|
|
- Kimi for Coding optimized for code + reasoning tasks
|
|
- Truncate search results to ~4000 tokens to stay within context
|
|
- Cache syntheses for identical result sets
|
|
|
|
---
|
|
|
|
### 3.4 Rate Limiter (`src/middleware/ratelimit.py`)
|
|
|
|
**Purpose:** Protect against abuse and control costs
|
|
|
|
**Strategy:**
|
|
- IP-based: 30 requests/minute
|
|
- Global: 1000 requests/hour (configurable)
|
|
- Burst: Allow 5 requests immediately, then token bucket
|
|
|
|
---
|
|
|
|
## 4. Data Models (`src/models/`)
|
|
|
|
### SearchResult
|
|
```python
|
|
class SearchResult(BaseModel):
|
|
title: str
|
|
url: str
|
|
content: str | None # Snippet or full text
|
|
source: str # Engine name
|
|
score: float | None
|
|
published: datetime | None
|
|
```
|
|
|
|
### ResearchResponse
|
|
```python
|
|
class ResearchResponse(BaseModel):
|
|
query: str
|
|
depth: str
|
|
synthesis: str
|
|
sources: list[dict] # {title, url, index}
|
|
raw_results: list[SearchResult] | None # null if omit_raw=true
|
|
metadata: dict # {latency_ms, cache_hit, tokens_used}
|
|
```
|
|
|
|
### Config
|
|
```python
|
|
class Config(BaseModel):
|
|
searxng_url: str
|
|
kimi_api_key: str # Kimi for Coding API Key
|
|
cache_backend: Literal["memory", "redis"] = "memory"
|
|
rate_limit: dict # requests, window
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Testing Strategy
|
|
|
|
### Test Categories
|
|
|
|
| Category | Location | Responsibility |
|
|
|----------|----------|----------------|
|
|
| Unit | `tests/unit/` | Individual functions, pure logic |
|
|
| Integration | `tests/integration/` | Component interactions |
|
|
| E2E | `tests/e2e/` | Full request flow |
|
|
| Performance | `tests/perf/` | Load testing |
|
|
|
|
### Test Isolation Principle
|
|
**CRITICAL:** Each test category runs independently. No test should require another test to run first.
|
|
|
|
### 5.1 Unit Tests (`tests/unit/`)
|
|
|
|
**test_synthesizer.py:**
|
|
- Mock Kimi for Coding API responses
|
|
- Test prompt formatting
|
|
- Test User-Agent header injection
|
|
- Test token counting/truncation
|
|
- Test error handling (API down, auth errors)
|
|
|
|
**test_searxng_client.py:**
|
|
- Mock HTTP responses
|
|
- Test result parsing
|
|
- Test caching logic
|
|
- Test timeout handling
|
|
|
|
**test_models.py:**
|
|
- Pydantic validation
|
|
- Serialization/deserialization
|
|
|
|
### 5.2 Integration Tests (`tests/integration/`)
|
|
|
|
**Requires:** Running SearXNG instance (Docker)
|
|
|
|
**test_search_flow.py:**
|
|
- Real SearXNG queries
|
|
- Cache interaction
|
|
- Error propagation
|
|
|
|
**test_api.py:**
|
|
- FastAPI test client
|
|
- Request/response validation
|
|
- Rate limiting behavior
|
|
|
|
### 5.3 E2E Tests (`tests/e2e/`)
|
|
|
|
**test_research_endpoint.py:**
|
|
- Full flow: query → search → synthesize → response
|
|
- Verify citation format
|
|
- Verify source attribution
|
|
|
|
---
|
|
|
|
## 6. Implementation Phases
|
|
|
|
### Phase 1: Foundation (No LLM yet) ✅ COMPLETE
|
|
**Goal:** Working search API
|
|
**Deliverables:**
|
|
- [x] Project structure with pyproject.toml
|
|
- [x] SearXNG client with async HTTP
|
|
- [x] FastAPI router with `/search` endpoint
|
|
- [x] Basic tests (mocked) - 28 tests, 92% coverage
|
|
- [x] Docker Compose for SearXNG
|
|
|
|
**Acceptance Criteria:**
|
|
```bash
|
|
curl -X POST http://localhost:8000/search \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"q": "python asyncio", "engines": ["google"]}'
|
|
# Returns valid SearXNG results
|
|
```
|
|
|
|
**Status:** ✅ All tests passing, 92% coverage
|
|
|
|
### Phase 2: Synthesis Layer ✅ COMPLETE
|
|
**Goal:** Add Kimi for Coding integration
|
|
**Deliverables:**
|
|
- [x] Synthesizer class with Kimi for Coding API
|
|
- [x] `/research` endpoint combining search + synthesis
|
|
- [x] Prompt templates
|
|
- [x] Response formatting with citations
|
|
- [x] User-Agent header handling
|
|
|
|
**Acceptance Criteria:**
|
|
```bash
|
|
curl -X POST http://localhost:8000/research \
|
|
-d '{"query": "What is Python asyncio?"}'
|
|
# Returns synthesized answer with citations
|
|
```
|
|
|
|
**Status:** ✅ Implemented, tested (40 tests, 90% coverage)
|
|
|
|
### Phase 3: Polish
|
|
**Goal:** Production readiness
|
|
**Deliverables:**
|
|
- [ ] Rate limiting
|
|
- [ ] Caching (Redis optional)
|
|
- [ ] Structured logging
|
|
- [ ] Health checks
|
|
- [ ] Metrics (Prometheus)
|
|
- [ ] Documentation
|
|
|
|
---
|
|
|
|
## 7. Configuration
|
|
|
|
### Environment Variables
|
|
```bash
|
|
RESEARCH_BRIDGE_SEARXNG_URL=http://localhost:8080
|
|
RESEARCH_BRIDGE_KIMI_API_KEY=sk-kimi-... # Kimi for Coding Key
|
|
RESEARCH_BRIDGE_LOG_LEVEL=INFO
|
|
RESEARCH_BRIDGE_REDIS_URL=redis://localhost:6379 # optional
|
|
```
|
|
|
|
### Important: Kimi for Coding API Requirements
|
|
```python
|
|
# The API requires a special User-Agent header!
|
|
headers = {
|
|
"Authorization": f"Bearer {api_key}",
|
|
"Content-Type": "application/json",
|
|
"User-Agent": "KimiCLI/0.77" # ← REQUIRED! 403 without this
|
|
}
|
|
```
|
|
|
|
### Docker Compose (SearXNG)
|
|
```yaml
|
|
# config/searxng-docker-compose.yml
|
|
version: '3'
|
|
services:
|
|
searxng:
|
|
image: searxng/searxng:latest
|
|
ports:
|
|
- "8080:8080"
|
|
volumes:
|
|
- ./searxng-settings.yml:/etc/searxng/settings.yml
|
|
```
|
|
|
|
---
|
|
|
|
## 8. API Contract
|
|
|
|
### POST /research
|
|
|
|
**Request:**
|
|
```json
|
|
{
|
|
"query": "latest developments in fusion energy",
|
|
"depth": "deep",
|
|
"sources": ["web", "news"],
|
|
"language": "en",
|
|
"omit_raw": false
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"query": "latest developments in fusion energy",
|
|
"depth": "deep",
|
|
"synthesis": "Recent breakthroughs in fusion energy include... [1] Commonwealth Fusion Systems achieved... [2]",
|
|
"sources": [
|
|
{"index": 1, "title": "Fusion breakthrough", "url": "https://..."},
|
|
{"index": 2, "title": "CFS milestone", "url": "https://..."}
|
|
],
|
|
"raw_results": [...],
|
|
"metadata": {
|
|
"latency_ms": 3200,
|
|
"cache_hit": false,
|
|
"tokens_used": 1247,
|
|
"cost_usd": 0.0
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 9. Cost Analysis
|
|
|
|
### Per-Query Costs
|
|
|
|
| Component | Cost | Notes |
|
|
|-----------|------|-------|
|
|
| **SearXNG** | **$0.00** | Self-hosted, Open Source, keine API-Kosten |
|
|
| **Kimi for Coding** | **$0.00** | Via bestehendes Abo (keine zusätzlichen Kosten) |
|
|
| **Gesamt pro Query** | **$0.00** | |
|
|
|
|
**Vergleich:**
|
|
| Lösung | Kosten pro Query | Faktor |
|
|
|--------|------------------|--------|
|
|
| Perplexity Sonar Pro | ~$0.015-0.03 | ∞ (teurer) |
|
|
| Perplexity API direkt | ~$0.005 | ∞ (teurer) |
|
|
| **Research Bridge** | **$0.00** | **Baseline** |
|
|
|
|
**Einsparung: 100%** der laufenden Kosten!
|
|
|
|
### Warum ist das komplett kostenlos?
|
|
- **SearXNG:** Gratis (Open Source, self-hosted)
|
|
- **Kimi for Coding:** Bereits über bestehendes Abo abgedeckt
|
|
- Keine API-Kosten, keine Rate-Limits, keine versteckten Gebühren
|
|
|
|
### Break-Even Analysis
|
|
- Einrichtungsaufwand: ~10 Stunden
|
|
- Bei beliebiger Nutzung: **$0 laufende Kosten** vs. $X mit Perplexity
|
|
|
|
---
|
|
|
|
## 10. Success Criteria
|
|
|
|
### Functional
|
|
- [ ] `/research` returns synthesized answers in <5s
|
|
- [ ] Citations link to original sources
|
|
- [ ] Rate limiting prevents abuse
|
|
- [ ] Health endpoint confirms all dependencies
|
|
|
|
### Quality
|
|
- [ ] Answer quality matches Perplexity in blind test (n=20)
|
|
- [ ] Citation accuracy >95%
|
|
- [ ] Handles ambiguous queries gracefully
|
|
|
|
### Operational
|
|
- [ ] 99% uptime (excluding planned maintenance)
|
|
- [ ] <1% error rate
|
|
- [ ] Logs structured for observability
|
|
|
|
---
|
|
|
|
## 11. Risks & Mitigations
|
|
|
|
| Risk | Likelihood | Impact | Mitigation |
|
|
|------|------------|--------|------------|
|
|
| SearXNG instance down | Medium | High | Deploy redundant instance, fallback engines |
|
|
| Kimi for Coding API changes | Low | Medium | Abstract API client, monitor for breaking changes |
|
|
| User-Agent requirement breaks | Low | High | Hardcoded header, monitor API docs for updates |
|
|
| Answer quality poor | Medium | High | A/B test prompts, fallback to deeper search |
|
|
|
|
---
|
|
|
|
## 12. Future Enhancements
|
|
|
|
- **Follow-up questions:** Context-aware multi-turn research
|
|
- **Source extraction:** Fetch full article text via crawling
|
|
- **PDF support:** Search and synthesize academic papers
|
|
- **Custom prompts:** User-defined synthesis instructions
|
|
- **Webhook notifications:** Async research with callback
|
|
|
|
---
|
|
|
|
## 13. Appendix: Implementation Notes
|
|
|
|
### Kimi for Coding API Specifics
|
|
|
|
**Required Headers:**
|
|
```python
|
|
headers = {
|
|
"Authorization": f"Bearer {api_key}",
|
|
"Content-Type": "application/json",
|
|
"User-Agent": "KimiCLI/0.77" # ← CRITICAL! 403 without this
|
|
}
|
|
```
|
|
|
|
**OpenAI-Compatible Client Setup:**
|
|
```python
|
|
from openai import AsyncOpenAI
|
|
|
|
client = AsyncOpenAI(
|
|
base_url="https://api.kimi.com/coding/v1",
|
|
api_key=api_key,
|
|
default_headers={"User-Agent": "KimiCLI/0.77"}
|
|
)
|
|
```
|
|
|
|
**Model Name:** `kimi-for-coding`
|
|
|
|
**Prompting Best Practices:**
|
|
- Works best with clear, structured prompts
|
|
- Handles long contexts well
|
|
- Use explicit formatting instructions
|
|
- Add "Think step by step" for complex synthesis
|
|
|
|
### SearXNG Tuning
|
|
- Enable `json` format for structured results
|
|
- Use `safesearch=0` for unfiltered results
|
|
- Request `time_range: month` for recent content
|
|
- Add "Think step by step" for complex synthesis
|
|
|
|
### SearXNG Tuning
|
|
- Enable `json` format for structured results
|
|
- Use `safesearch=0` for unfiltered results
|
|
- Request `time_range: month` for recent content
|
|
|
|
---
|
|
|
|
**Document Version:** 1.0
|
|
**Last Updated:** 2026-03-14
|
|
**Next Review:** Post-Phase-1 implementation
|