A few months ago, I noticed something frustrating.

I’d spend hours trying to learn new concepts — watching videos, reading articles, or chatting with ChatGPT. But it still felt slow. Clunky. Passive. Typing questions felt like homework. Scrolling for the “right” explanation was exhausting.

So I asked myself:

What if learning felt more like a conversation?

That question turned into Learnflow AI — a voice-powered learning assistant you can talk to, like a personal tutor on demand.

In this series, I’ll show you exactly how I built it — from zero to a real-time, voice-enabled GPT-4 app using Vapi, Next.js, and OpenAI.

What is Learnflow AI?

Learnflow AI is a voice-first learning interface — think ChatGPT, but you don’t type. You talk, and the AI talks back in real time.

It uses Vapi.ai for streaming voice interaction and GPT-4 for intelligent answers. This combo creates an incredibly natural tutoring experience — no UI clutter, just press a button and speak.

You can use this same stack to build:

What We’re Building in This Part

Goal: A production-grade MVP that lets you speak to GPT-4 and get real-time spoken answers.

Here’s what’s included in Part 1:

Why Voice-First?

Typing is slow. Scrolling is overwhelming. Voice changes everything.

When you ask questions out loud:

Talking feels like learning — typing feels like searching.

My Tech Stack

Layer

Tech

Why It Was Chosen

Voice Interface

Vapi.ai

Real-time audio streaming + OpenAI-ready

LLM Provider

OpenAI GPT-4

High-quality answers, fast inference

Frontend

Next.js (App router)

Scalable file-based routing

Styling

Tailwind CSS

Fast to iterate, responsive

Components

Radix UI + Shadcn

Accessible, low-level UI primitives

Language

Typescript

DX + type safety

Hosting

Vercel

Instant deploys for Next.js

File Structure (Voice MVP)

This part of the app is intentionally simple — it’s purpose is to quickly show the voice assistant in action.

learnflow-ai/
├── app/
        └── globals.css
        └── layout.tsx
        └── page.tsx
├── constants/
    └── soundwaves.json
└── lib/
        └── utils.ts
    └── vapi.sdk.ts

We focus purely on getting the voice flow working before layering on state, auth, db, or personalization.

Step-by-Step: Setting Up the Voice Assistant

This part assumes you have already setup your next.js app router codebase + installing shadcn. You can follow the steps here to do that:

Step 1: Setup your App layout

import type { Metadata } from "next";
import { Bricolage_Grotesque } from "next/font/google";
import "./globals.css";

const bricolage = Bricolage_Grotesque({
  variable: "--font-bricolage",
  subsets: ["latin"],
});

export const metadata: Metadata = {
  title: "Learnflow AI",
  description: "A voice-only learning platform for developers",
};

export default function RootLayout({
  children,
}: Readonly<{
  children: React.ReactNode;
}>) {
  return (
    <html lang="en">
      <body className={`${bricolage.variable} antialiased`}>
          {children}
      </body>
    </html>
  );
}

Step 2: Create a constants folder and a soundwaves.json file and then paste this JSON (constants/soundwaves.json)

{"nm":"Render","ddd":0,"h":250,"w":250,"meta":{"g":"LottieFiles AE 3.1.1"},"layers":[{"ty":4,"nm":"Arrow Outlines 4","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[100.5,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.103],[12.471,18.868]]}],"t":30},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.324],[12.471,19.235]]}],"t":45},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.515,4.206],[12.515,25.853]]}],"t":70},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,8.985],[12.471,17.912]]}],"t":83},{"o":{"x":0.333,"y":0},"i":{"x":0.667,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.515,5.309],[12.544,24.088]]}],"t":97},{"o":{"x":0.333,"y":0},"i":{"x":1,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,7.588],[12.456,20.044]]}],"t":109},{"o":{"x":0.333,"y":0},"i":{"x":1,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,6.265],[12.456,21]]}],"t":121},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":128.000005213547}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":1},{"ty":4,"nm":"Arrow Outlines 3","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[146.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.973,"y":0},"i":{"x":0.581,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,13.765],[12.441,18.353]]}],"t":30},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,13.721]]}],"t":45},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.529,7.074],[12.5,15.191]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,5.529],[12.441,16.735]]}],"t":83},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.544,7.147],[12.515,15.044]]}],"t":97},{"o":{"x":0.973,"y":0},"i":{"x":0.592,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,3.103],[12.471,21.809]]}],"t":109},{"o":{"x":0.973,"y":0},"i":{"x":0.893,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":122},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":129.000005254278}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":2},{"ty":4,"nm":"Arrow Outlines 2","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[116.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.333,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.333,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":24},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,4.647],[12.441,25.706]]}],"t":41},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":55},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,6.118],[12.471,20.412]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.456,1.926],[12.456,22.838]]}],"t":87},{"o":{"x":0.973,"y":0},"i":{"x":0,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,7.735],[12.471,20.265]]}],"t":101},{"o":{"x":0.333,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":115},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":134.000005457932}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":3},{"ty":4,"nm":"Arrow Outlines","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[13,15.5,0],"ix":1},"s":{"a":0,"k":[170,170,100],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[131.75,126,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 1","ix":1,"d":1,"ks":{"a":1,"k":[{"o":{"x":0.973,"y":0},"i":{"x":0.581,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":0},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":30},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":45},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":70},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":83},{"o":{"x":0.167,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.5,2],[12.5,28.5]]}],"t":97},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,5.75],[12.471,21.809]]}],"t":109},{"o":{"x":0.973,"y":0},"i":{"x":0.24,"y":1},"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]}],"t":125},{"s":[{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.471,13.618],[12.471,14.676]]}],"t":132.00000537647}],"ix":2}},{"ty":"st","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Stroke","nm":"Stroke 1","lc":2,"lj":1,"ml":4,"o":{"a":0,"k":100,"ix":4},"w":{"a":0,"k":4,"ix":5},"c":{"a":0,"k":[0.9922,0.949,0.9922,1],"ix":3}},{"ty":"sh","bm":0,"hd":false,"mn":"ADBE Vector Shape - Group","nm":"Path 2","ix":3,"d":1,"ks":{"a":0,"k":{"c":false,"i":[[0,0],[0,0]],"o":[[0,0],[0,0]],"v":[[12.441,11.118],[12.441,16.735]]},"ix":2}}],"ind":4},{"ty":4,"nm":"cir 1","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":30},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[111.8,111.8,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[120,120,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.667,"y":1},"s":[111.8,111.8,100],"t":107.661},{"s":[111.8,111.8,100],"t":134.000005457932}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":100,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":5},{"ty":4,"nm":"cir 2","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[110,110,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":34},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[150,150,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[150,150,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[194.775,194.775,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.274,"y":1},"s":[150,150,100],"t":117},{"s":[110,110,100],"t":123.966255049249}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":10,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":6},{"ty":4,"nm":"cir 3","sr":1,"st":0,"op":300.00001221925,"ip":0,"hd":false,"ddd":0,"bm":0,"hasMask":false,"ao":0,"ks":{"a":{"a":0,"k":[0,0,0],"ix":1},"s":{"a":1,"k":[{"o":{"x":0.775,"y":0},"i":{"x":0.206,"y":1},"s":[110,110,100],"t":0},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":34},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[190,190,100],"t":60},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":71.289},{"o":{"x":0.167,"y":0},"i":{"x":0.206,"y":1},"s":[190,190,100],"t":82.576},{"o":{"x":0.665,"y":0},"i":{"x":0.274,"y":1},"s":[230,230,100],"t":97.628},{"o":{"x":0.167,"y":0},"i":{"x":0.274,"y":1},"s":[190,190,100],"t":118},{"s":[110,110,100],"t":134.000005457932}],"ix":6},"sk":{"a":0,"k":0},"p":{"a":0,"k":[126.539,126.166,0],"ix":2},"r":{"a":0,"k":0,"ix":10},"sa":{"a":0,"k":0},"o":{"a":0,"k":100,"ix":11}},"ef":[],"shapes":[{"ty":"gr","bm":0,"hd":false,"mn":"ADBE Vector Group","nm":"Ellipse 1","ix":1,"cix":2,"np":3,"it":[{"ty":"el","bm":0,"hd":false,"mn":"ADBE Vector Shape - Ellipse","nm":"Ellipse Path 1","d":1,"p":{"a":0,"k":[0,0],"ix":3},"s":{"a":0,"k":[99.367,99.367],"ix":2}},{"ty":"fl","bm":0,"hd":false,"mn":"ADBE Vector Graphic - Fill","nm":"Fill 1","c":{"a":0,"k":[0.1255,0.1529,0.1725,1],"ix":4},"r":1,"o":{"a":0,"k":5,"ix":5}},{"ty":"tr","a":{"a":0,"k":[0,0],"ix":1},"s":{"a":0,"k":[100,100],"ix":3},"sk":{"a":0,"k":0,"ix":4},"p":{"a":0,"k":[-1.539,-1.166],"ix":2},"r":{"a":0,"k":0,"ix":6},"sa":{"a":0,"k":0,"ix":5},"o":{"a":0,"k":100,"ix":7}}]}],"ind":7}],"v":"4.8.0","fr":29.9700012207031,"op":135.000005498663,"ip":0,"assets":[]}

Step 3: Create another file in your constants folder (constants/index.ts) and paste this code

export const subjects = [
  "javascript",
  "python",
  "html",
  "css",
  "algorithms",
  "databases",
];

export const subjectsColors = {
  javascript: "#FFD166",
  python: "#9BE7FF",
  html: "#FF9AA2",
  css: "#B5EAD7",
  algorithms: "#CBAACB",
  databases: "#FFDAC1",
};

export const voices = {
  male: { casual: "2BJW5coyhAzSr8STdHbE", formal: "c6SfcYrb2t09NHXiT80T" },
  female: { casual: "ZIlrSGI4jZqobxRKprJz", formal: "sarah" },
};

export const recentSessions = [
  {
    id: "1",
    subject: "javascript",
    name: "Codey the JS Debugger",
    topic: "Understanding Closures",
    duration: 40,
    color: "#FFD166",
  },
  {
    id: "2",
    subject: "python",
    name: "Snakey the Python Guru",
    topic: "List Comprehensions & Lambdas",
    duration: 35,
    color: "#9BE7FF",
  },
  {
    id: "3",
    subject: "html",
    name: "Structo the Markup Architect",
    topic: "Semantic Tags & Accessibility",
    duration: 25,
    color: "#FF9AA2",
  },
  {
    id: "4",
    subject: "css",
    name: "Stylo the Flexbox Wizard",
    topic: "Flexbox vs Grid Layouts",
    duration: 30,
    color: "#B5EAD7",
  },
  {
    id: "5",
    subject: "algorithms",
    name: "Algo the Problem Solver",
    topic: "Binary Search Explained Visually",
    duration: 45,
    color: "#CBAACB",
  },
  {
    id: "6",
    subject: "databases",
    name: "Query the Data Whisperer",
    topic: "SQL Joins: Inner vs Outer",
    duration: 20,
    color: "#FFDAC1",
  },
];

Step 4: Install lottie-react package

npm install lottie-react

Step 5: Install the Vapi SDK

npm install @vapi-ai/web

You’ll need a free Vapi account to get your API key.

Step 6: Initialize the Vapi Client (lib/vapi.sdk.ts)

This sets up the Vapi SDK with your API key, allowing your app to connect to Vapi’s voice infrastructure:

import Vapi from "@vapi-ai/web";

export const vapi = new Vapi(process.env.NEXT_PUBLIC_VAPI_WEB_TOKEN!);

It initializes the core Vapi client that handles real-time audio, streaming, and connection to your AI assistant. Every voice interaction starts here.

Step 7: Create a (lib/utils.ts) file and paste these codes

import { clsx, type ClassValue } from "clsx";
import { twMerge } from "tailwind-merge";
import { subjectsColors, voices } from "@/constants";
import { CreateAssistantDTO } from "@vapi-ai/web/dist/api";

export function cn(...inputs: ClassValue[]) {
  return twMerge(clsx(inputs));
}

export const getSubjectColor = (subject: string) => {
  return subjectsColors[subject as keyof typeof subjectsColors];
};

export const configureAssistant = (voice: string, style: string) => {
  const voiceId = voices[voice as keyof typeof voices][
          style as keyof (typeof voices)[keyof typeof voices]
          ] || "sarah";

  const vapiAssistant: CreateAssistantDTO = {
    name: "Companion",
    firstMessage:
        "Hello, let's start the session. Today we'll be talking about {{topic}}.",
    transcriber: {
      provider: "deepgram",
      model: "nova-3",
      language: "en",
    },
    voice: {
      provider: "11labs",
      voiceId: voiceId,
      stability: 0.4,
      similarityBoost: 0.8,
      speed: 1,
      style: 0.5,
      useSpeakerBoost: true,
    },
    model: {
      provider: "openai",
      model: "gpt-4",
      messages: [
        {
          role: "system",
          content: `You are a highly knowledgeable tutor teaching a real-time voice session with a student. Your goal is to teach the student about the topic and subject.

                    Tutor Guidelines:
                    Stick to the given topic - {{ topic }} and subject - {{ subject }} and teach the student about it.
                    Keep the conversation flowing smoothly while maintaining control.
                    From time to time make sure that the student is following you and understands you.
                    Break down the topic into smaller parts and teach the student one part at a time.
                    Keep your style of conversation {{ style }}.
                    Keep your responses short, like in a real voice conversation.
                    Do not include any special characters in your responses - this is a voice conversation.
              `,
        },
      ],
    },
    clientMessages: [],
    serverMessages: [],
  };
  return vapiAssistant;
};

Step 8: Create your assistant

Simply paste this code into your (app/page.tsx) file

'use client';

import {useEffect, useRef, useState} from 'react'
import {cn, configureAssistant, getSubjectColor} from "@/lib/utils";
import {vapi} from "@/lib/vapi.sdk";
import Image from "next/image";
import Lottie, {LottieRefCurrentProps} from "lottie-react";
import soundwaves from '@/constants/soundwaves.json'
import { useMutation } from "convex/react";
import { api } from '@/convex/_generated/api';
import { Id } from "@/convex/_generated/dataModel";

enum CallStatus {
    INACTIVE = 'INACTIVE',
    CONNECTING = 'CONNECTING',
    ACTIVE = 'ACTIVE',
    FINISHED = 'FINISHED',
}

const Page = () => {

        //Demo details
        const subject = "javascript"
        const topic = "React and Typescript"
        const name = "Better Call Saul"
        const style = "casual"
        const voice = "male"
        const userName = "Shola - student"
        const userImage = "images/me.png"

    const [callStatus, setCallStatus] = useState<CallStatus>(CallStatus.INACTIVE);
    const [isSpeaking, setIsSpeaking] = useState(false);
    const [isMuted, setIsMuted] = useState(false);
    const [messages, setMessages] = useState<SavedMessage[]>([]);

    const lottieRef = useRef<LottieRefCurrentProps>(null);

    useEffect(() => {
        if(lottieRef) {
            if(isSpeaking) {
                lottieRef.current?.play()
            } else {
                lottieRef.current?.stop()
            }
        }
    }, [isSpeaking, lottieRef])

    useEffect(() => {
        const onCallStart = () => setCallStatus(CallStatus.ACTIVE);

        const onCallEnd = () => {
            setCallStatus(CallStatus.FINISHED);
        }

        const onMessage = (message: Message) => {
            if(message.type === 'transcript' && message.transcriptType === 'final') {
                const newMessage= { role: message.role, content: message.transcript}
                setMessages((prev) => [newMessage, ...prev])
            }
        }

        const onSpeechStart = () => setIsSpeaking(true);
        const onSpeechEnd = () => setIsSpeaking(false);

        const onError = (error: Error) => console.log('Error', error);

        vapi.on('call-start', onCallStart);
        vapi.on('call-end', onCallEnd);
        vapi.on('message', onMessage);
        vapi.on('error', onError);
        vapi.on('speech-start', onSpeechStart);
        vapi.on('speech-end', onSpeechEnd);

        return () => {
            vapi.off('call-start', onCallStart);
            vapi.off('call-end', onCallEnd);
            vapi.off('message', onMessage);
            vapi.off('error', onError);
            vapi.off('speech-start', onSpeechStart);
            vapi.off('speech-end', onSpeechEnd);
        }
    }, []);

    const toggleMicrophone = () => {
        const isMuted = vapi.isMuted();
        vapi.setMuted(!isMuted);
        setIsMuted(!isMuted)
    }

    const handleCall = async () => {
        setCallStatus(CallStatus.CONNECTING)

        const assistantOverrides = {
            variableValues: { subject, topic, style },
            clientMessages: ["transcript"],
            serverMessages: [],
        }

        // @ts-expect-error - The configureAssistant function's return type doesn't match the expected type, but it works at runtime
        vapi.start(configureAssistant(voice, style), assistantOverrides)
    }

    const handleDisconnect = () => {
        setCallStatus(CallStatus.FINISHED)
        vapi.stop()
    }

    return (
        <section className="flex flex-col h-[70vh]">
            <section className="flex gap-8 max-sm:flex-col">
                <div className="companion-section">
                    <div className="companion-avatar" style={{ backgroundColor: getSubjectColor(subject)}}>
                        <div
                            className={
                            cn(
                                'absolute transition-opacity duration-1000', callStatus === CallStatus.FINISHED || callStatus === CallStatus.INACTIVE ? 'opacity-1001' : 'opacity-0', callStatus === CallStatus.CONNECTING && 'opacity-100 animate-pulse'
                            )
                        }>
                            <Image src={`/icons/${subject}.svg`} alt={subject} width={150} height={150} className="max-sm:w-fit" />
                        </div>

                        <div className={cn('absolute transition-opacity duration-1000', callStatus === CallStatus.ACTIVE ? 'opacity-100': 'opacity-0')}>
                            <Lottie
                                lottieRef={lottieRef}
                                animationData={soundwaves}
                                autoplay={false}
                                className="companion-lottie"
                            />
                        </div>
                    </div>
                    <p className="font-bold text-2xl">{name}</p>
                </div>

                <div className="user-section">
                    <div className="user-avatar">
                        <Image src={userImage} alt={userName} width={130} height={130} className="rounded-lg" />
                        <p className="font-bold text-2xl">
                            {userName}
                        </p>
                    </div>
                    <button className="btn-mic" onClick={toggleMicrophone} disabled={callStatus !== CallStatus.ACTIVE}>
                        <Image src={isMuted ? '/icons/mic-off.svg' : '/icons/mic-on.svg'} alt="mic" width={36} height={36} />
                        <p className="max-sm:hidden">
                            {isMuted ? 'Turn on microphone' : 'Turn off microphone'}
                        </p>
                    </button>
                    <button className={cn('rounded-lg py-2 cursor-pointer transition-colors w-full text-white', callStatus ===CallStatus.ACTIVE ? 'bg-red-700' : 'bg-primary', callStatus === CallStatus.CONNECTING && 'animate-pulse')} onClick={callStatus === CallStatus.ACTIVE ? handleDisconnect : handleCall}>
                        {callStatus === CallStatus.ACTIVE
                        ? "End Session"
                        : callStatus === CallStatus.CONNECTING
                            ? 'Connecting'
                        : 'Start Session'
                        }
                    </button>
                </div>
            </section>

            <section className="transcript">
                <div className="transcript-message no-scrollbar">
                    {messages.map((message, index) => {
                        if(message.role === 'assistant') {
                            return (
                                <p key={index} className="max-sm:text-sm">
                                    {
                                        name
                                            .split(' ')[0]
                                            .replace('/[.,]/g, ','')
                                    }: {message.content}
                                </p>
                            )
                        } else {
                           return <p key={index} className="text-primary max-sm:text-sm">
                                {userName}: {message.content}
                            </p>
                        }
                    })}
                </div>

                <div className="transcript-fade" />
            </section>
        </section>
    )
}

export default Page

Why Vapi?

Without Vapi, you’d need to manage:

Vapi handles all of this with just a few lines of code. It’s like magic for voice-first AI apps.

How the Voice Assistant Works (Step-by-Step)

  1. User clicks the Call Button
  2. Vapi opens a live voice stream
  3. User speaks a question
  4. Vapi transcribes and sends it to OpenAI (GPT-4)
  5. OpenAI returns a response
  6. Vapi turns the response into speech
  7. Browser plays the response to the user

Local Setup

Prerequisites:

.env.local file:

env
NEXT_PUBLIC_VAPI_WEB_TOKEN=your_vapi_web_token
VAPI_SECRET_KEY=your_vapi_secret_ke

Run It:

npm install
npm run dev

Open http://localhost:3000 and press the button to start talking to GPT-4.

Takeaways

Learnflow AI proves one thing:

Talking to an AI is way more fluid than typing to one.

The Vapi + GPT-4 combo lets you build powerful assistants with:

And you can build the whole MVP in a weekend.

What’s Coming in Part 2

Next, we’ll go deeper and make it personal:

Try the MVP or Build Your Own

GitHub: github.com/sholajegede/learnflow_ai

If you want to setup Kinde Auth, check out this post.