AI-Powered Workshop Appointment System

Voice AILiveKitReal-TimeWordPressOpenAI

Building a voice assistant that lets workshop staff manage appointments, customers, and vehicle history through natural conversation. From a self-hosted pipeline with 3-second delays to production-grade real-time voice with LiveKit.

Client: INofficina.it

The Challenge

Auto repair workshops handle a constant stream of service requests: tyre changes, engine maintenance, bodywork, vehicle inspections. Staff juggle booking calendars, customer databases, and vehicle records, switching between screens while a customer is on the phone or standing at the counter.

My goal was to let workshop staff say something like "Book a service for the grey SUV, plate AB 123 CD, first available slot next Wednesday, and send a WhatsApp confirmation with a reminder the day before" and have it all happen in one go. No typing, no screen switching, no manual data entry across three different systems.

But there are also more complex scenarios. For example: "Email all customers who had a tyre change more than 6 months ago and remind them it's time to come back." This kind of request touches multiple services, requires heavy queries across the CRM, and needs tool implementations that are both performant and safe.

Building a voice assistant for these scenarios is both a technical and a design problem. Two challenges shaped the entire project: latency (a 3-second delay turns an assistant into an obstacle) and observability (I needed full visibility into every stage of the pipeline: what was transcribed, how the LLM interpreted it, and how the response was converted back to speech).

Solution Approach

The Architecture Decision: Conversational vs. Cascade

The first choice was the voice architecture. Conversational (end-to-end) models like OpenAI Realtime API and Gemini Live handle everything in one shot. Lower latency, but the model is a black box: you can't see what was transcribed, how the LLM reasoned, or how the TTS rendered the response. You can't swap components independently, and debugging is guesswork. The alternative is a cascade pipeline (Speech-to-Text → LLM → Text-to-Speech) where each stage is observable and independently tunable. I chose the cascade approach because I needed that observability: when a license plate is misheard, I need to know whether it was a transcription error or an LLM interpretation issue. When a response sounds wrong, I need to isolate whether the problem is in the text generation or in the speech synthesis.

Phase 1: Self-Hosted Pipeline

The initial cascade implementation ran on my own servers: OpenAI Whisper for transcription, GPT-4o with function calling for intent processing, and OpenAI TTS for speech synthesis. This architecture worked for prototyping and proved the cascade model was correct, but each step added latency. Audio upload, transcription, LLM reasoning, TTS generation, audio download. By the time the assistant replied, 3 to 5 seconds had passed. The pipeline was right; the execution needed to be faster.

Phase 2: LiveKit, Real-Time Voice Infrastructure

I found LiveKit, an open-source platform for real-time audio and video communication. LiveKit lets you run the same cascade pipeline but on infrastructure designed for real-time performance: WebRTC for low-latency audio streaming, a Python framework (LiveKit Agents) for building voice AI pipelines server-side, built-in Voice Activity Detection, and streaming TTS that starts playing before the full sentence is generated. Audio streams via WebRTC with no file uploads and no polling. The LLM starts generating while the user is still finishing their sentence, and TTS plays while the LLM is still generating text. I kept full observability into every stage while getting sub-second response times.

The PHP-Python Bridge

INofficina.it runs on WordPress with the Bookly appointment plugin (PHP), but LiveKit Agents is Python-native. I needed to bridge the two. The LiveKit Agent (Python) handles the voice pipeline. The WordPress REST API (PHP) exposes appointment tools: create, search, modify bookings, look up customers and vehicles, check availability. The AI Commander plugin connects them, registering tools in WordPress and exposing them as REST endpoints that the Python agent calls via function calling. When the LLM needs to book an appointment, it calls a function; the Python agent forwards it to the WordPress REST API; the result flows back and gets spoken to the user, all in under a second.

Results & Impact

Response times dropped from 3 to 5 seconds (self-hosted pipeline) to under 1 second with LiveKit
Technicians interact with the system hands-free while working on vehicles, without touching a screen
Real-time availability validation has virtually eliminated double bookings
Mileage, repair details, and vehicle history are captured during natural conversation, keeping CRM records complete
The voice assistant runs inside the existing WordPress platform. No separate system to maintain

Technologies Used

LiveKit Agents: real-time voice AI pipeline (VAD + STT + LLM + TTS)
LiveKit WebRTC for low-latency audio streaming
OpenAI GPT-4o with function calling for tool orchestration
WordPress AI Commander Plugin: extensible tool system and REST API
Next.js frontend for the LiveKit voice assistant UI

Discuss Your Project