Skip to main content
The /api/chat endpoint combines vector search with LLM generation to provide context-aware responses based on uploaded documents.

Overview

The chat endpoint performs:
  1. Vector Search - Retrieves relevant document chunks (if sources not provided)
  2. Context Building - Constructs RAG context from retrieved sources
  3. LLM Generation - Calls LLM provider (Workers AI, OpenAI, Anthropic, Google)
  4. Response Formatting - Returns response with sources and metadata

Endpoint

POST /api/chat

request
{
  query: string;                    // User's question (required)
  agentType?: string;               // Agent type for prompt selection
  sources?: Source[];               // Pre-retrieved sources (optional)
  llmConfig?: {                    // LLM configuration (optional)
    provider: 'workers-ai' | 'google' | 'openai' | 'anthropic';
    model: string;
    maxTokens?: number;
  };
  systemPrompt?: string;            // Custom system prompt (optional)
  temperature?: number;             // Default: 0.2
  historyContext?: string;          // Chat history (optional)
  userGoals?: string;              // User goals for personalization
  userId?: string;                 // User ID (for rate limiting)
  topK?: number;                    // Default: 5 (for search)
}
response
{
  response: string;                // LLM generated response
  sources: Source[];               // Retrieved sources
  provider: string;                // LLM provider used
  model: string;                   // Model used
}

Request Parameters

query (required)

The user’s question or prompt.
{
  "query": "What is retrieval-augmented generation?"
}

agentType (optional)

Predefined agent type for automatic prompt selection. Available types:
  • rag-chat - General RAG assistant
  • graph-analyst - Knowledge graph analysis
  • executive-summary - Executive summaries
  • technical-auditor - Technical analysis
  • future-planner - Strategic planning
  • coordinator - Idea synthesis
  • critic - Critical analysis
{
  "query": "Analyze the knowledge graph structure",
  "agentType": "graph-analyst"
}

sources (optional)

Pre-retrieved sources. If not provided, the endpoint performs vector search automatically.
{
  "query": "Explain RAG systems",
  "sources": [
    {
      "id": "chunk-uuid",
      "text": "Retrieval-Augmented Generation combines...",
      "metadata": { "documentTitle": "RAG Overview" }
    }
  ]
}

llmConfig (optional)

LLM provider and model configuration. Defaults to Workers AI if not specified.
{
  "query": "What is AI?",
  "llmConfig": {
    "provider": "openai",
    "model": "gpt-4o",
    "maxTokens": 1000
  }
}
Supported Providers:
  • workers-ai - Cloudflare Workers AI (default, no API key required)
  • google - Google Gemini (requires GEMINI_API_KEY)
  • openai - OpenAI (requires OPENAI_API_KEY)
  • anthropic - Anthropic Claude (requires ANTHROPIC_API_KEY)

systemPrompt (optional)

Custom system prompt. Overrides agentType if provided.
{
  "query": "Explain quantum computing",
  "systemPrompt": "You are a quantum physics expert. Provide detailed explanations."
}

temperature (optional)

Controls response creativity. Default: 0.2 (more deterministic).
  • 0.0 - Very deterministic
  • 0.7 - Balanced
  • 1.0 - More creative

historyContext (optional)

Previous conversation context for continuity.
{
  "query": "Tell me more about that",
  "historyContext": "Previous conversation: User asked about RAG systems..."
}

userGoals (optional)

User goals for personalized responses.
{
  "query": "What should I learn next?",
  "userGoals": "Build RAG system, learn vector databases"
}

Rate Limiting

  • Limit: 10 requests per minute per user
  • Headers:
    • X-RateLimit-Remaining - Remaining requests
    • X-RateLimit-Reset - Reset timestamp
  • Status: 429 when limit exceeded
response
{
  "error": "Rate limit exceeded",
  "message": "Too many chat requests. Please try again later.",
  "resetAt": 1704067260000
}

Example Requests

Basic Chat

const response = await fetch('https://parti.metacogna.ai/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'What is artificial intelligence?',
    userId: 'user-uuid'
  })
});

const { response: answer, sources } = await response.json();

With Agent Type

const response = await fetch('https://parti.metacogna.ai/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Analyze the knowledge graph structure',
    agentType: 'graph-analyst',
    userId: 'user-uuid'
  })
});

With Custom LLM

const response = await fetch('https://parti.metacogna.ai/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Explain RAG systems',
    llmConfig: {
      provider: 'openai',
      model: 'gpt-4o',
      maxTokens: 1000
    },
    temperature: 0.7,
    userId: 'user-uuid'
  })
});

With Pre-retrieved Sources

const response = await fetch('https://parti.metacogna.ai/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Summarize these documents',
    sources: [
      { id: 'chunk-1', text: 'Document 1 content...' },
      { id: 'chunk-2', text: 'Document 2 content...' }
    ],
    userId: 'user-uuid'
  })
});

Environment Setup

API Keys (Optional)

API keys are stored as Cloudflare secrets (not in wrangler.toml):
# Google Gemini (optional)
bun wrangler secret put GEMINI_API_KEY

# OpenAI (optional)
bun wrangler secret put OPENAI_API_KEY

# Anthropic (optional)
bun wrangler secret put ANTHROPIC_API_KEY
If no API keys are set, the endpoint uses Cloudflare Workers AI (free, no keys required).

Benefits

  1. Security - API keys never exposed to frontend
  2. Cost Control - Centralized rate limiting and usage tracking
  3. Flexibility - Easy to switch between LLM providers
  4. Performance - Single request for search + generation
  5. Consistency - All API calls go through worker