IA / RAG · Laravel 12 NB-C002

Portfolio AI Assistant, BRI

Nelson Brilhante · 2026·06 · Project Ask BRI about this ↗

FIG. NB-C002 · IA / RAG · Laravel 12

The conversational assistant on this site, built on Laravel 12 with a Filament v4 admin panel. It is a RAG (Retrieval-Augmented Generation) system that answers in Portuguese and English from the portfolio's real content, namely services, projects and news, with guardrails against inventing information.

A request runs through several stages end to end. Each question goes through a hybrid retrieval step, moves on to context assembly, is answered by a language model grounded in that context, and streams back to the browser token by token.

Technical schematic of the assistant's RAG pipeline: browser question to web app, then hybrid retrieval (FTS5 lexical search and embeddings fused by RRF from a knowledge store), context assembly with catalogue, contacts and veracity rule, a grounded language model, and token-by-token streaming back to the browser with a retry guard. A separate lane shows supporting subsystems: conversation memory, a health probe with localised offline mode, and an editorial-drafting cloud failover chain across three providers. Dark dossier style, cyan on near-black, monospace typography.

Retrieval is hybrid. It combines SQLite FTS5 lexical search with embeddings (nomic-embed-text) and fuses both result sets by Reciprocal Rank Fusion, with a light recency factor. Authoritative blocks, such as the record catalogue, the real contacts and a veracity rule, are injected into the prompt to eliminate hallucinations.

Models are split by purpose. The chatbot runs on a no-quota backend, using the gpt-oss 120b model with an extended context window, configured without an intermediate reasoning phase because the answer is grounded. Automated editorial drafting uses a cloud chain with automatic failover that moves from one provider to the next on a 429 or 5xx error, going in order from Gemini to Groq and finally OpenRouter.

The streaming layer detects mid-reply cut-offs and offers a retry. A health probe puts the widget into a localised offline mode and recovers it on its own, without affecting the rest of the site. Conversation memory, backed by the server-stored thread, keeps the visitor's context across the exchange. It was designed, integrated and is running in production on this very site.

Visit project ↗