The conversational assistant on this site, built on Laravel 12 with a Filament v4 admin. A RAG (Retrieval-Augmented Generation) system that answers in Portuguese and English from the portfolio's real content — services, projects and news — with guardrails against inventing information.
A request, end to end:
┌───────────────────┐
│ Question (BRI) │
└─────────┬─────────┘
▼
Hybrid retrieval — SQLite FTS5 + embeddings, fused (RRF)
▼
Context assembly — catalogue + contacts + veracity rule
▼
LLM (grounded) — answer generated token by token
▼
┌───────────────────┐
│ Browser (BRI) │ ◀ Server-Sent Events, retry-aware
└───────────────────┘
Retrieval is hybrid: SQLite FTS5 lexical search combined with embeddings (nomic-embed-text) and fused by Reciprocal Rank Fusion, with a light recency factor. Authoritative blocks (record catalogue, real contacts and a veracity rule) are injected into the prompt to eliminate hallucinations.
Models are split by purpose:
Chatbot (BRI) ─────▶ ┌────────────────────────────┐
│ Ollama Cloud │ no quota
│ gpt-oss:120b (no think) │ 128k context
└────────────────────────────┘
Editorial Desk ─────▶ Gemini ─▶ Groq ─▶ OpenRouter
└─── failover on 429 / 5xx ───┘
The chat runs on a no-quota backend (Ollama Cloud), while automated editorial drafting uses a cloud chain with automatic failover. The streaming layer detects mid-reply cut-offs and offers a retry; a health probe puts the widget into a localised offline mode and recovers it on its own; and conversation memory, backed by the server-stored thread, keeps the visitor's context across the dialogue.
Designed, integrated and running in production on this very site.