Remember that moment you watched an AI generate text token-by-token? It felt like magic. You could see the thought process unfold, the sentences building in real-time. It was a huge leap from the static, wait-for-the-whole-message experience.
But let’s be honest — it was also a bit like watching someone else use a computer. The AI was a brilliant storyteller, but it was stuck in a box. It could tell you how to book a flight, but it couldn’t click the “Book Now” button. It could describe your recent orders, but it couldn’t fetch the list from your database.
This is the fundamental limitation of a pure Large Language Model: it has no hands.
The tool-call render pattern is the solution. It’s the architectural pattern that gives AI its hands, transforming it from a passive text generator into an active, collaborative partner that can execute real-world tasks.
This guide will break down this powerful pattern, from the core ReAct framework to a production-ready code example you can use today.
From Spectator to Collaborator: The Core Concept
In streaming UI the server sends a continuous flow of tokens, and the client consumes them. It’s a one-way street: Server → Client. The client is a spectator.
The tool-call pattern flips this model into a collaborative loop. It’s no longer just about displaying text; it's about executing functions and rendering the results as structured, interactive components.
Let’s use an analogy:
- Streaming: You’re reading a recipe line-by-line as it’s being written. You’re a spectator.
- Tool-Call: You’re the head chef. You tell your sous-chef (the AI), “I need a list of nearby restaurants.” Instead of describing the process, the sous-chef goes to the kitchen (the server), executes the function to fetch the list, and brings the prepared ingredients (the data) back to you, ready for the next step.
The UI becomes a dynamic dashboard of the AI’s actions, not just a chat log.
The “Why”: Bridging Language and Code.
The magic behind this pattern is Function Calling. This is the standardized protocol that allows an LLM to bridge the gap between its probabilistic world of language and the deterministic world of code.
When an LLM receives a request like “What’s the weather in London?”, it doesn’t just hallucinate an answer. It recognizes it needs an external tool. It then:
- Plans: Reasons which tool to use (
get_weather). - Acts: Constructs a structured request (e.g.,
{ "city": "London" }). - Observes: Receives the result from your server-side function.
This transforms the LLM from a conversational partner into a planner and a coordinator. Think of it as a Client-Side Router for AI:
- The LLM is the router, analyzing the user’s intent.
- The Tools are your API endpoints, providing the data.
- The UI is the view, rendering the result.
This is how we build genuinely functional systems. The AI becomes the logic layer that connects user intent to server-side execution.
The Engine Under the Hood: ReAct and the T-A-O Loop
The tool-call pattern is a practical implementation of the ReAct (Reasoning and Acting) framework. It operates on a simple but powerful feedback loop:
- Thought: The AI’s internal monologue. It analyzes the prompt, its available tools, and the conversation history. It asks, “What needs to happen next?”
- Action: The AI decides to use a tool. It generates a structured output (a JSON object) specifying the function name and parameters. Control is handed off to your server.
- Observation: Your server executes the function and returns the result. This result is fed back into the AI’s context as an “Observation.”
This cycle repeats until the request is fulfilled. The AI thinks, acts, observes, and thinks again.
The Blueprint: Your Tool’s API Contract
Before the AI can act, it needs to know what actions are possible. This is defined in a Function Calling Schema — a machine-readable contract that acts as your tool’s API documentation.
A typical schema includes:
name: The unique identifier (e.g.,get_weather).description: A clear explanation the AI uses to decide which tool is appropriate. Good descriptions are critical.parameters: A JSON Schema defining the expected inputs (types, required fields, descriptions).
Here’s a schema for a user profile lookup tool:
const userProfileTool = {
name: "get_user_profile",
description: "Retrieves the public profile information for a specific user.",
parameters: {
type: "object",
properties: {
username: {
type: "string",
description: "The unique username of the user to look up.",
},
},
required: ["username"],
},
};
When the AI sees a prompt like “What is the profile for user ‘alice123’?”, it consults this schema, identifies the correct tool, and generates the Action step.
The Render Pattern: Making Execution Visible
This is where the tool-call pattern truly shines. It’s not enough to execute the function and return the result at the end. The pattern dictates that the state of the tool's execution must be streamed and rendered in the UI in real-time.
This creates a transparent, trustworthy user experience. The UI breaks down into distinct states:
- Initiation: The AI decides to call a tool. The UI immediately renders a placeholder — a loading spinner or a message like “Checking the database…”.
- Execution: While the server-side function runs, the streaming connection keeps the UI element active.
- Completion (Success): The loading state is replaced by the actual content, rendered as a distinct component (e.g., a user profile card).
- Error Handling: If the function fails, the error is streamed back and rendered as an alert box.
The user doesn’t just see an answer; they see the process of how that answer was derived.
The Ultimate Power: Chaining Multi-Step Workflows
The true power emerges when you chain multiple tool calls. The Observation from one tool becomes the context for the next Thought, leading to a new Action.
Consider this complex request: “Find all open pull requests for ‘vercel/ai’ and summarize the latest comment on each.”
A single tool can’t do this. The tool-call pattern enables a sequence:
- Action 1: Call
list_pull_requestsfor 'vercel/ai'. - Observation 1: Get a list of PRs (e.g., #123, #124).
- Thought 2: The AI now knows the PRs exist and decides to fetch comments for each.
- Action 2: Call
get_pr_commentsfor PR #123. The UI streams a loading state for this specific PR. - Observation 2: Get the comments. The UI updates with the summary.
- Action 3: Call
get_pr_commentsfor PR #124. The UI updates again. - Final Thought: The AI synthesizes all observations into a final summary.
By chaining tools, we can build sophisticated, multi-step workflows that feel like a seamless conversation.
Code Example: A Weather SaaS Tool
Let’s build a practical example using Next.js and the Vercel AI SDK. We’ll create a simple SaaS feature where an AI assistant fetches the current weather. The UI will stream the tool’s execution state in real-time.
This example uses an Edge-First Deployment Strategy for low latency and Strict Type Discipline with Zod to prevent runtime errors.
The Code
Server-Side API Route (app/api/chat/route.ts)
import { streamText, ToolExecutionUnion } from 'ai';
import { z } from 'zod';
import { openai } from '@ai-sdk/openai';
// Define the tool. This logic runs on the Edge Runtime.
const fetchWeatherTool = {
description: 'Get the current weather for a given city.',
parameters: z.object({
city: z.string().describe('The city name (e.g., "New York", "London")'),
}),
// The execute function runs ONLY when the LLM decides to call this tool.
execute: async ({ city }: { city: string }) => {
// Simulate a network delay to show loading states
await new Promise((resolve) => setTimeout(resolve, 1500));
// Mock data based on city
const weatherMap: Record<string, string> = {
'new york': 'Sunny, 22°C',
'london': 'Rainy, 15°C',
'tokyo': 'Cloudy, 18°C',
};
const weather = weatherMap[city.toLowerCase()] || 'Unknown weather conditions';
return {
city,
weather,
timestamp: new Date().toISOString(),
};
},
};
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
model: openai('gpt-4-turbo-preview'),
messages,
tools: {
getWeather: fetchWeatherTool as ToolExecutionUnion,
},
system: 'You are a helpful assistant. Use the getWeather tool to answer questions about the weather.',
});
// Stream the response back to the client
return result.toAIStreamResponse();
}
Client-Side UI (app/page.tsx)
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
api: '/api/chat',
});
return (
<div className="flex flex-col w-full max-w-md mx-auto p-4 space-y-4">
<div className="border rounded-lg p-4 h-64 overflow-y-auto space-y-2">
{messages.map((message, index) => (
<div key={index} className="p-2 rounded bg-gray-100">
<strong>{message.role === 'user' ? 'You: ' : 'AI: '}</strong>
{/* CRITICAL RENDER LOGIC */}
{message.toolInvocations ? (
message.toolInvocations.map((tool, toolIndex) => (
<div key={toolIndex} className="mt-2 p-2 bg-blue-50 text-sm text-blue-800 rounded">
{tool.state === 'call' && (
<span>⚡ Executing tool: {tool.toolName} for {tool.args.city}...</span>
)}
{tool.state === 'result' && (
<span>
✅ Result: {tool.result.city} is {tool.result.weather}
</span>
)}
</div>
))
) : (
<span>{message.content}</span>
)}
</div>
))}
{isLoading && (
<div className="text-gray-500 italic">AI is thinking...</div>
)}
</div>
<form onSubmit={handleSubmit} className="flex gap-2">
<input
type="text"
value={input}
onChange={handleInputChange}
placeholder="Ask about weather in New York..."
className="flex-1 border p-2 rounded"
/>
<button type="submit" className="bg-black text-white px-4 py-2 rounded">
Send
</button>
</form>
</div>
);
}
Line-by-Line Breakdown
parameters: z.object({ city: z.string() }): We use Zod for schema validation. If the LLM tries to call this tool with a number or a missingcityfield, the SDK rejects it before it ever reaches ourexecutefunction. This is your runtime safety net.execute: async ({ city }) => { ... }: This function runs only on the server when the LLM invokes the tool. It executes in the Edge Runtime, minimizing latency. The return value is automatically serialized and streamed back to the client.message.toolInvocations: This is the magic property injected by the Vercel AI SDK. It allows us to conditionally render tool-specific UI instead of just raw text.tool.state === 'call': This state appears the instant the LLM requests the tool. We render a "Executing..." indicator here for immediate feedback.tool.state === 'result': This state appears once the server-sideexecutefunction completes. We render the final data here.
Common Pitfalls to Avoid
- Vercel Edge Timeouts: The Edge Runtime has short timeouts (10–30s). Keep your
executefunctions lightweight. For heavy tasks, offload them to background jobs and return a "job ID" instead of blocking the stream. - Async/Await Loops in
execute: Don't run multiple tool calls sequentially inside a singleexecutefunction. This blocks the stream. Let the LLM's ReAct loop handle the sequence naturally. - Hallucinated JSON: Even with Zod, the LLM might generate malformed arguments. Zod’s
.refine()methods can help validate logic before execution. - Missing
'use client': In Next.js App Router, any component usinguseChatmust be marked with'use client'. The API route remains a Server Component.
Conclusion
The tool-call render pattern is more than a technical implementation detail; it's a fundamental shift in how we design AI applications. By making the AI's reasoning and execution process visible and interactive, we move beyond simple Q&A and into the realm of true, collaborative problem-solving.
We’re no longer just building chatbots. We’re building AI agents that can think, plan, and act, with a UI that faithfully renders every step of their journey. This is the future of functional, AI-powered software.
The concepts and code demonstrated here are drawn directly from the comprehensive roadmap laid out in the book “The Modern Stack. Building Generative UI with Next.js, Vercel AI SDK, and React Server Components”
The ebook is availble Leanpub.com along with many others: https://leanpub.com/u/edgarmilvus