architectureSystem Design

How Archie Works

A LangGraph agentic pipeline with streaming Groq inference, live web search, and persistent session memory.

LangGraphGroq · Llama 3.3 70BNeon PostgreSQLSerper · Google SearchDrizzle ORM

Pipeline Flow

Node Breakdown

loadSession

No LLM · pure I/O

No LLM

Queries Neon DB for existing session (name, email, importantInfo)
Reads knowledge.json → kbJsonToText() converts JSON to clean == SECTION == text
Builds numbered user-question list for session memory introspection

orchestrator

Groq LLM · intent router

Groq LLM

Fast-path regex catches trivial greetings — skips all agent calls
LLM decomposes message into typed tasks: search | internal_kb
Detects personal mode (emotional) and session mode (introspection)
Multi-intent: one message can spawn multiple parallel tasks

queryRewriter

Groq LLM · search optimiser

Groq LLM

Runs one Groq call per search task — all in parallel via Promise.all
Rewrites vague intent into a precise 3–8 word Google query
Resolves pronoun references using conversation history
KB tasks bypass rewriting — passed through unchanged

agents

Parallel execution · Promise.all

Parallel

Search agent: calls Serper API → returns top 5 results + image cards
KB agent: keyword match scoring on kbJsonToText output (≥30% → return full profile)
Both run simultaneously — total latency = slowest single agent, not sum
Results tagged by intent for multi-source attribution in synthesis

judge

No LLM · decision logger

No LLM

Inspects final combined context — logs whether it is sufficient
Populates thinkingLogs array shown in Thinking Mode UI
No LLM call — pure state inspection

★

Synthesis + Streaming

Groq streaming · Archie persona

Groq Stream

Builds layered system prompt: Archie persona → mode addendum → grounding rules
KB context: "answer only from this profile, do not invent"
Search context: "live results, use them, cite as [[N]](url)"
Streams llama-3.3-70b-versatile response as ReadableStream to client
Client parses: [THINKING], [SEARCH_DATA], [SUGGESTIONS], plain text

Session Memory

Each browser session gets a unique ID. Every conversation turn is appended to a JSONB array in Neon DB. A secondary Groq call extracts the user's name, email, and notable facts from their message — stored alongside the session. On the next turn, loadSession re-reads this, letting Archie remember who you are and personalise responses.

id (session key)

history (JSONB array)

name · email

importantInfo (facts)

Other DB tables (posts, admins, subscribers) belong to the blog and newsletter — separate from the chatbot pipeline.

If You Want to Scale Up

Current stack is fully free. These are paid upgrades for production-grade robustness — not necessary for a portfolio.

Component	Current (Free)	Production Upgrade
KB Search	Keyword match (≥30% score)	Pinecone / Weaviate — semantic vector search
LLM Inference	Groq free tier · 100k tok/day	Groq paid / OpenAI GPT-4o — higher rate limits
Rate Limiting	None	Upstash Redis — sliding window per IP
Auth	Client-generated session IDs	Clerk / NextAuth — verified user sessions
Observability	console.error + thinkingLogs	LangSmith — full LangGraph trace monitoring
Embeddings	None	OpenAI text-embedding-3-small — semantic KB

chatChat with Archie