arrow_backBack to Chat
architectureSystem Design

How Archie Works

A LangGraph agentic pipeline with streaming Groq inference, live web search, and persistent session memory.

LangGraphGroq · Llama 3.3 70BNeon PostgreSQLSerper · Google SearchDrizzle ORM

Pipeline Flow

User Messagemessage · sessionId · history · thinkModePOST /apiNext.js Route HandlerLANGGRAPH STATE MACHINE1loadSessionNeon DB session read · knowledge.json → kbJsonToText() · history assemblyNeon DBknowledge.json2orchestratorGroq LLM · decompose multi-intent · route: trivial / personal / session / search / kbGroq LLMroute?trivial / personal / session→ skip to judgeno agents invokedsearch / internal_kb tasks3queryRewriterGroq LLM · optimise queries · parallel per taskGroq LLM4agents — Promise.allSearch → SerperGoogle top 5 resultsKB → keyword scanknowledge.json matchSerper API5judgeLog routing decision · populate thinkingLogs · no LLM callSynthesis + Groq StreamingArchie persona prompt · KB vs search grounding · max 3000 tokens · llama-3.3-70b-versatileGroq StreamReadableStream → Clienttext · searchCards · suggestions · thinkingLogsbackground · fire-and-forgetsaveToDB → Neon DBpersist JSONB history · extract name / email / facts via Groq

Node Breakdown

1

loadSession

No LLM · pure I/O

No LLM
  • Queries Neon DB for existing session (name, email, importantInfo)
  • Reads knowledge.json → kbJsonToText() converts JSON to clean == SECTION == text
  • Builds numbered user-question list for session memory introspection
2

orchestrator

Groq LLM · intent router

Groq LLM
  • Fast-path regex catches trivial greetings — skips all agent calls
  • LLM decomposes message into typed tasks: search | internal_kb
  • Detects personal mode (emotional) and session mode (introspection)
  • Multi-intent: one message can spawn multiple parallel tasks
3

queryRewriter

Groq LLM · search optimiser

Groq LLM
  • Runs one Groq call per search task — all in parallel via Promise.all
  • Rewrites vague intent into a precise 3–8 word Google query
  • Resolves pronoun references using conversation history
  • KB tasks bypass rewriting — passed through unchanged
4

agents

Parallel execution · Promise.all

Parallel
  • Search agent: calls Serper API → returns top 5 results + image cards
  • KB agent: keyword match scoring on kbJsonToText output (≥30% → return full profile)
  • Both run simultaneously — total latency = slowest single agent, not sum
  • Results tagged by intent for multi-source attribution in synthesis
5

judge

No LLM · decision logger

No LLM
  • Inspects final combined context — logs whether it is sufficient
  • Populates thinkingLogs array shown in Thinking Mode UI
  • No LLM call — pure state inspection

Synthesis + Streaming

Groq streaming · Archie persona

Groq Stream
  • Builds layered system prompt: Archie persona → mode addendum → grounding rules
  • KB context: "answer only from this profile, do not invent"
  • Search context: "live results, use them, cite as [[N]](url)"
  • Streams llama-3.3-70b-versatile response as ReadableStream to client
  • Client parses: [THINKING], [SEARCH_DATA], [SUGGESTIONS], plain text

Session Memory

Each browser session gets a unique ID. Every conversation turn is appended to a JSONB array in Neon DB. A secondary Groq call extracts the user's name, email, and notable facts from their message — stored alongside the session. On the next turn, loadSession re-reads this, letting Archie remember who you are and personalise responses.

id (session key)
history (JSONB array)
name · email
importantInfo (facts)

Other DB tables (posts, admins, subscribers) belong to the blog and newsletter — separate from the chatbot pipeline.

If You Want to Scale Up

Current stack is fully free. These are paid upgrades for production-grade robustness — not necessary for a portfolio.

ComponentCurrent (Free)Production Upgrade
KB SearchKeyword match (≥30% score)Pinecone / Weaviate — semantic vector search
LLM InferenceGroq free tier · 100k tok/dayGroq paid / OpenAI GPT-4o — higher rate limits
Rate LimitingNoneUpstash Redis — sliding window per IP
AuthClient-generated session IDsClerk / NextAuth — verified user sessions
Observabilityconsole.error + thinkingLogsLangSmith — full LangGraph trace monitoring
EmbeddingsNoneOpenAI text-embedding-3-small — semantic KB