The /api/chat endpoint combines vector search with LLM generation to provide context-aware responses based on uploaded documents.
Overview
The chat endpoint performs:
- Vector Search - Retrieves relevant document chunks (if sources not provided)
- Context Building - Constructs RAG context from retrieved sources
- LLM Generation - Calls LLM provider (Workers AI, OpenAI, Anthropic, Google)
- Response Formatting - Returns response with sources and metadata
Endpoint
POST /api/chat
{
query: string; // User's question (required)
agentType?: string; // Agent type for prompt selection
sources?: Source[]; // Pre-retrieved sources (optional)
llmConfig?: { // LLM configuration (optional)
provider: 'workers-ai' | 'google' | 'openai' | 'anthropic';
model: string;
maxTokens?: number;
};
systemPrompt?: string; // Custom system prompt (optional)
temperature?: number; // Default: 0.2
historyContext?: string; // Chat history (optional)
userGoals?: string; // User goals for personalization
userId?: string; // User ID (for rate limiting)
topK?: number; // Default: 5 (for search)
}
{
response: string; // LLM generated response
sources: Source[]; // Retrieved sources
provider: string; // LLM provider used
model: string; // Model used
}
Request Parameters
query (required)
The user’s question or prompt.
{
"query": "What is retrieval-augmented generation?"
}
agentType (optional)
Predefined agent type for automatic prompt selection. Available types:
rag-chat - General RAG assistant
graph-analyst - Knowledge graph analysis
executive-summary - Executive summaries
technical-auditor - Technical analysis
future-planner - Strategic planning
coordinator - Idea synthesis
critic - Critical analysis
{
"query": "Analyze the knowledge graph structure",
"agentType": "graph-analyst"
}
sources (optional)
Pre-retrieved sources. If not provided, the endpoint performs vector search automatically.
{
"query": "Explain RAG systems",
"sources": [
{
"id": "chunk-uuid",
"text": "Retrieval-Augmented Generation combines...",
"metadata": { "documentTitle": "RAG Overview" }
}
]
}
llmConfig (optional)
LLM provider and model configuration. Defaults to Workers AI if not specified.
{
"query": "What is AI?",
"llmConfig": {
"provider": "openai",
"model": "gpt-4o",
"maxTokens": 1000
}
}
Supported Providers:
workers-ai - Cloudflare Workers AI (default, no API key required)
google - Google Gemini (requires GEMINI_API_KEY)
openai - OpenAI (requires OPENAI_API_KEY)
anthropic - Anthropic Claude (requires ANTHROPIC_API_KEY)
systemPrompt (optional)
Custom system prompt. Overrides agentType if provided.
{
"query": "Explain quantum computing",
"systemPrompt": "You are a quantum physics expert. Provide detailed explanations."
}
temperature (optional)
Controls response creativity. Default: 0.2 (more deterministic).
0.0 - Very deterministic
0.7 - Balanced
1.0 - More creative
historyContext (optional)
Previous conversation context for continuity.
{
"query": "Tell me more about that",
"historyContext": "Previous conversation: User asked about RAG systems..."
}
userGoals (optional)
User goals for personalized responses.
{
"query": "What should I learn next?",
"userGoals": "Build RAG system, learn vector databases"
}
Rate Limiting
- Limit: 10 requests per minute per user
- Headers:
X-RateLimit-Remaining - Remaining requests
X-RateLimit-Reset - Reset timestamp
- Status:
429 when limit exceeded
{
"error": "Rate limit exceeded",
"message": "Too many chat requests. Please try again later.",
"resetAt": 1704067260000
}
Example Requests
Basic Chat
const response = await fetch('https://parti.metacogna.ai/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'What is artificial intelligence?',
userId: 'user-uuid'
})
});
const { response: answer, sources } = await response.json();
With Agent Type
const response = await fetch('https://parti.metacogna.ai/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'Analyze the knowledge graph structure',
agentType: 'graph-analyst',
userId: 'user-uuid'
})
});
With Custom LLM
const response = await fetch('https://parti.metacogna.ai/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'Explain RAG systems',
llmConfig: {
provider: 'openai',
model: 'gpt-4o',
maxTokens: 1000
},
temperature: 0.7,
userId: 'user-uuid'
})
});
With Pre-retrieved Sources
const response = await fetch('https://parti.metacogna.ai/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'Summarize these documents',
sources: [
{ id: 'chunk-1', text: 'Document 1 content...' },
{ id: 'chunk-2', text: 'Document 2 content...' }
],
userId: 'user-uuid'
})
});
Environment Setup
API Keys (Optional)
API keys are stored as Cloudflare secrets (not in wrangler.toml):
# Google Gemini (optional)
bun wrangler secret put GEMINI_API_KEY
# OpenAI (optional)
bun wrangler secret put OPENAI_API_KEY
# Anthropic (optional)
bun wrangler secret put ANTHROPIC_API_KEY
If no API keys are set, the endpoint uses Cloudflare Workers AI (free, no keys required).
Benefits
- Security - API keys never exposed to frontend
- Cost Control - Centralized rate limiting and usage tracking
- Flexibility - Easy to switch between LLM providers
- Performance - Single request for search + generation
- Consistency - All API calls go through worker