The Problem: Low Actionable Feedback Rates


Here's a common pattern in e-commerce feedback (and these are realistic for most websites, not edge cases):

Launch product → 10,000 customers buy → 10–20% leave reviews → 20–30% of those contain actionable insights.

For example, apparel tends to sit toward the higher end of review volume, consumer electronics toward the lower end, and SaaS higher on volume but uneven on quality.

Product decisions often rely on feedback from roughly 2–6% of the shopper base. The silent 94–98% remain largely unknown. Writing reviews requires effort, which naturally filters out customers without strong opinions.

Why traditional feedback struggles?

What businesses see:

What users experience:

The Solution: Intelligent MCQ Interviews


What if we could extract high-signal feedback effortlessly—driving product improvement, building platform trust, and making it easy for users to shape better products?

Instead of asking users to write reviews, we interview them with hyper-personalized MCQs. Not generic forms. Not hardcoded decision trees. Adaptive, context-aware conversations where each question is shaped by:

The core insight: MCQs collapse cognitive load while preserving signal. One-click responses can generate the same depth as written reviews—if the questions are smart enough.

The potential value:

Introducing Survey Sensei—a multi-agent system that implements this approach with four specialized agents:

  1. ProductContextAgent → Analyzes reviews/metadata to understand product strengths and weaknesses
  2. CustomerContextAgent → Builds behavioral profile from purchase history and review patterns
  3. SurveyAgent (adaptive MCQ engine) → Generates tailored questions, reshapes follow-ups based on each answer
  4. ReviewGenAgent → Synthesizes MCQ selections into natural language reviews in the user's voice

The full workflow (end-to-end example):

Step 1 (Product Intelligence): Analyze 213 reviews of the purchased laptop
        → Battery: 67% complain "dies mid-afternoon"
        → Keyboard: 82% praise "excellent typing experience"
        → Performance: 45% mention "handles multitasking well"

Step 2 (User Intelligence): Pull purchase history for this customer
        → Bought 3 laptops in past 2 years (power user pattern)
        → Reviews 85% of purchases, critical but fair (3.6★ average)
        → Detail-oriented: past reviews averaged 120 words

Step 3 (Adaptive Question 1):
        "You've purchased 3 laptops recently. What drove this upgrade?"
        ○ Better performance
        ● Longer battery life        ← USER SELECTED
        ○ Lighter/more portable
        ○ Other: [text]

Step 4 (Adaptive Follow-up):
        "How long does the battery last on a typical workday?"
        ○ All day (12+ hours)
        ● 4-8 hours                   ← USER SELECTED
        ○ Less than 4 hours

Step 5 (Probing Deeper):
        "Does this meet your battery expectations?"
        ○ Exceeds expectations
        ● Falls slightly short        ← USER SELECTED
        ○ Major disappointment

[Agent continues for 10-12 total questions, probing keyboard quality,
 performance, portability based on this user's priorities...]

Step 6 (Review Synthesis): Convert MCQ selections → natural language
        "Upgraded hoping for better battery. Lasts 4-8 hours—falls short
         of all-day claims, but manageable for office work. Keyboard is
         outstanding for typing. Performance handles multitasking well."

Contrast with generic tools:

Why Now? What Changed to Make This Possible


The convergence of cheaper, more intelligent models with rapidly declining token costs has made AI-powered personalization economically viable at scale.

2020 (GPT-3 era):

2025 (GPT-4/GPT-5 era and evolving):

What This Enables

1. Per-user personalization:

2. Adaptive vs. static workflows:

3. Natural language synthesis:

4. Economic accessibility:

Two-Part Architecture


Before diving into the details, it's critical to understand how the project is structured. The diagram below shows the complete system architecture—from the UI layer through the orchestrator to the multi-agent framework, along with data pipelines and database schema:

Before diving into the details, it's critical to understand how the project is structured:

Part 1: Simulation Infrastructure (Testing Layer)

Purpose: Development scaffolding—test the core system without production data.

MockDataOrchestrator creates semi-realistic e-commerce ecosystems:

  1. Products: RapidAPI fetch (real Amazon data) + 5 similar products (LLM) + 3 diverse products (LLM)
  2. Users: Main user (you) + N mock personas
  3. Reviews: RapidAPI reviews (real) + LLM-generated reviews
  4. Transactions: 40-60% have reviews (matches reality)
  5. Embeddings: Batch parallel processing

In production: Skip this entirely and integrate with real e-commerce databases and pipelines.

Part 2: Agentic Survey Framework (Core USP)

Purpose: The actual product—adaptive survey generation + authentic review synthesis.

This is the heart of Survey Sensei. The 4-agent system decomposes into specialized agents, each with focused responsibility:

Agent 1: ProductContextAgent

Build a mental model of the product before generating questions.

Three-path adaptive logic:

  1. Direct Reviews Path (Confidence: 70-95%)
    • Condition: Product already has reviews in database
    • Ranking heuristic: Recency (50% weight, exponential decay with 180-day half-life) + Quality (40%, review length) + Diversity (10%, bonus for 3-4 star reviews)
    • Confidence formula: 0.70 + (num_reviews / 100), capped at 0.95
    • Extracts: Key features, pain points, use cases, pros/cons, sentiment patterns
  2. Similar Products Path (Confidence: 55-80%)
    • Condition: No reviews for this product, but vector-similar products exist
    • Process: Cosine similarity search via pgvector (threshold: 0.7)
    • Ranking heuristic: Similarity (40%) + Recency (35%) + Quality (20%) + Diversity (5%)
    • Extracts: Inferred experience from analogous products
  3. Generic/Description Only Path (Confidence: 40-50%)
    • Condition: New product, zero reviews anywhere
    • Process: Parse title, description, category metadata
    • Extracts: Educated guesses (e.g., "Wireless device → ask about battery")

Output schema:

class ProductContext:
  key_features: List[str]
  major_concerns: List[str]
  pros: List[str]
  cons: List[str]
  common_use_cases: List[str]
  context_type: str
  confidence_score: float

Agent 2: CustomerContextAgent

Build behavioral profiles to personalize question depth and tone.

Three-path adaptive logic:

  1. Exact Interaction Path (Confidence: 85-95%)
    • User bought THIS exact product before
    • Ground truth on what they thought
  2. Similar Products Path (Confidence: 55-80%)
    • Ranking heuristic: Similarity (45%) + Recency (30%) + Engagement (25%)
    • Infer preferences from purchase patterns (e.g., "Bought 3 noise-canceling headphones → cares about ANC quality")
  3. Demographics Path (Confidence: 35-45%)
    • Brand new user, zero purchase history
    • Generic baseline persona

Output schema:

class CustomerContext:
  purchase_patterns: List[str]
  review_behavior: List[str]
  product_preferences: List[str]
  primary_concerns: List[str]
  expectations: List[str]
  pain_points: List[str]
  engagement_level: str  # highly_engaged | moderately_engaged | passive_buyer | new_user
  sentiment_tendency: str  # positive | critical | balanced | polarized | neutral
  review_engagement_rate: float
  confidence_score: float

Personalization:

Agent 3: SurveyAgent (Stateful)

Conducts adaptive surveys where questions evolve based on answers. Uses LangGraph StateGraph for conversation state.

Performance optimization: Survey state is cached in-memory during the survey (no database writes on every answer). State is only persisted to the database at two points:

  1. Survey start: Initial contexts frozen to product_context and customer_context JSONB columns
  2. Survey completion: Final Q&A written to questions_and_answers JSONB, complete state to session_context JSONB

All intermediate answers are logged asynchronously to survey_details table for analytics (fire-and-forget, non-blocking).

The interview flow:

┌─ Survey Start ─────────────────────────────────────────────┐
│                                                            │
│  1. Fetch contexts in parallel:                            │
│     ├─ ProductContextAgent → What to ask about             │
│     └─ CustomerContextAgent → How to ask it                │
│                                                            │
│  2. Generate initial MCQs (3 questions baseline)           │
│                                                            │
│  3. Stateful conversation loop:                            │
│     ┌───────────────────────────────────────────┐          │
│     │  Present MCQ                              │          │
│     │  ↓                                        │          │
│     │  Wait for user selection                  │          │
│     │  ↓                                        │          │
│     │  Process answer → Update internal state   │          │
│     │  ↓                                        │          │
│     │  Route decision:                          │          │
│     │    ├─ Need follow-up? → Generate adaptive │          │
│     │    ├─ Move to next topic? → Next question │          │
│     │    └─ Survey complete? → Save & exit      │          │
│     └───────────────────────────────────────────┘          │
│                                                            │
└────────────────────────────────────────────────────────────┘

Survey completion rules:

initial_questions_count: 3        # Start with 3 baseline MCQs
min_answered_questions: 10        # User must answer ≥10
max_answered_questions: 15        # Hard stop at 15
max_survey_questions: 20          # Total questions asked
max_consecutive_skips: 3          # 3 consecutive skips → must answer to continue

Adaptive questioning example:

Question 5: "How long does the battery last on a typical workday?"
  ● Less than 4 hours       ← USER SELECTED

[Agent's internal state update:
  - Battery performance: Below average
  - Action: Generate follow-up to quantify impact]

Follow-up Question 6: "When does the battery typically die?"
  ● Mid-afternoon (2-4pm)   ← USER SELECTED

[Agent's internal state update:
  - Specific pain point: Dies at 2-4pm (work hours)
  - Severity: High (impacts productivity)
  - Action: Probe importance for review weighting]

Follow-up Question 7: "How important is longer battery life to you?"
  ● Very important - major inconvenience (USER SELECTED)

Why adaptive matters: Without adaptive AI, you'd build a rigid decision tree: "Battery life: Excellent | Good | Fair | Poor". That tells you what they think—but misses the critical detail: "Dies at 2pm during work hours, and it's a major inconvenience." The AI doesn't just branch—it regenerates the next question based on evolving context.

Agent 4: ReviewGenAgent

Convert MCQ selections into natural language reviews matching user's writing style.

Three-stage synthesis:

  1. Sentiment Classification → Analyze MCQ answers → Classify as good | okay | bad
  2. Voice Matching → Fetch historical reviews → Extract tone, vocabulary, sentence structure
  3. Generate 3 variations with different star ratings within sentiment band:
def _get_star_ratings(sentiment_band: str) -> List[int]:
    if sentiment_band == "good":
        return [5, 4]
    elif sentiment_band == "okay":
        return [4, 3, 2]
    else:  # bad
        return [2, 1]

Example output (sentiment: "okay", user: concise + critical):

[4-star] "Solid build quality and excellent screen. Battery dies around 3pm—acceptable for office use where I have charging access. Keyboard is comfortable for long typing. Performance handles multitasking well. Worth it on sale."

[3-star] "Mixed feelings. Build quality and screen are great, but battery is the main letdown—dies at 3pm despite 'all-day' claims. Keyboard is excellent. If battery isn't a dealbreaker, it's decent."

[2-star] "Disappointed with battery life. Product page advertised all-day battery, but it dies by 3pm daily with moderate use. Screen and keyboard are good, but battery is a major problem for anyone working away from chargers."

User picks framing, edits if needed, submits. 2 minutes of MCQ clicks → rich, authentic review.

Technical Implementation


Tech Stack

Backend: FastAPI (Python 3.11), LangChain + LangGraph, OpenAI GPT-4o-mini ($0.15/1M tokens), Pydantic, Supabase/PostgreSQL + pgvector

Frontend: Next.js 14, TypeScript, Tailwind CSS, Supabase Client

AI/ML: OpenAI embeddings (1536-dim), batch generation (100 texts in 2-3s), IVFFlat indexes (2-3% recall loss for 100x speed)

Database Schema

-- 1. PRODUCTS: Catalog with semantic embeddings
products (
  item_id VARCHAR(20) PRIMARY KEY,
  title, brand, description,
  price, star_rating, num_ratings,
  review_count INTEGER,
  embeddings vector(1536),                -- Semantic search
  is_mock BOOLEAN
)

-- 2. USERS: Behavioral profiles
users (
  user_id UUID PRIMARY KEY,
  user_name, email_id, age, gender, base_location,
  embeddings vector(1536),
  total_purchases INTEGER,
  total_reviews INTEGER,
  review_engagement_rate DECIMAL(4,3),
  avg_review_rating DECIMAL(3,2),
  sentiment_tendency VARCHAR(20),
  engagement_level VARCHAR(30),
  is_main_user BOOLEAN
)

-- 3. TRANSACTIONS: Purchase history
transactions (
  transaction_id UUID PRIMARY KEY,
  item_id → products,
  user_id → users,
  order_date, delivery_date,
  original_price, retail_price,
  transaction_status
)

-- 4. REVIEWS: Multi-source feedback
reviews (
  review_id UUID PRIMARY KEY,
  item_id → products,
  user_id → users,
  transaction_id → transactions,
  review_title, review_text, review_stars,
  source VARCHAR(20),                     -- 'rapidapi' | 'agent_generated' | 'user_survey'
  embeddings vector(1536)
)

-- 5. SURVEY_SESSIONS: Stateful survey orchestration
survey_sessions (
  session_id UUID PRIMARY KEY,
  user_id, item_id, transaction_id,
  product_context JSONB,                  -- Agent 1 output
  customer_context JSONB,                 -- Agent 2 output
  session_context JSONB,                  -- LangGraph state
  questions_and_answers JSONB,
  review_options JSONB,
  status VARCHAR(20)
)

-- 6. SURVEY_DETAILS: Event log
survey_details (
  detail_id UUID PRIMARY KEY,
  session_id → survey_sessions,
  event_type VARCHAR(50),
  event_detail JSONB,
  created_at TIMESTAMP
)

Design decisions:

  1. JSONB for flexibility → Agent outputs evolve without migrations
  2. Vector indexes → IVFFlat gives 100x speed for 2-3% recall loss
  3. Source tracking → rapidapi (real) | agent_generated (mock) | user_survey (golden path)
  4. Event sourcing → survey_details logs every interaction for debugging

Vector Similarity

All text → 1536-dim embeddings via text-embedding-3-small.

Find similar products:

SELECT item_id, title,
       1 - (embeddings <=> query_embedding) AS similarity
FROM products
WHERE 1 - (embeddings <=> query_embedding) > 0.7
ORDER BY similarity DESC LIMIT 5;

Why vectors beat traditional categories:

Traditional hierarchies (Electronics → Audio → Headphones → Wireless) are rigid. Vector embeddings cluster products by intent and use case:

Vector embeddings naturally cluster by intent rather than superficial attributes. "Noise-canceling Bluetooth headphones" is closer to "wireless earbuds with ANC" than to "studio monitor headphones"—even though all three are technically "headphones."

Performance benchmarks:

API Design

The API provides six endpoints that cover the end-to-end survey workflow:

Error handling and edge cases:

1. Session expiration:

2. Idempotency:

3. Pydantic validation:

Running It Locally


Source: github.com/arnavvj/survey-sensei

Prerequisites: Python 3.11+, Node.js 18+, Supabase account (free), OpenAI API key (~$5)

Backend Setup

1. Clone the repo and create a python environment:

git clone https://github.com/arnavvj/survey-sensei.git
cd survey-sensei/backend

conda env create -f environment.yml   # Installs all deps (FastAPI, LangChain, etc.)
conda activate survey-sensei

2. Configure environment variables:

cp .env.local.example .env.local

Edit .env.local with your credentials:

OPENAI_API_KEY=sk-proj-...                         # From platform.openai.com
SUPABASE_URL=https://xxxxx.supabase.co             # From Supabase dashboard
SUPABASE_SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiIs...  # From Supabase Settings → API
RAPID_API_KEY=your_rapidapi_key                    # Optional: From rapidapi.com

3. Initialize database:

python database/init/apply_migrations.py  # Applies migrations
# Execute SQL code from `backend\database\_combined_migrations.sql` in your supabase project

4. Start the backend:

uvicorn main:app --reload --port 8000

Frontend Setup

1. Navigate to frontend:

cd survey-sensei/frontend

2. Configure environment variables:

cp .env.local.example .env.local

Edit .env.local:

NEXT_PUBLIC_SUPABASE_URL=https://xxxxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJhbGciOiJIUzI1NiIs...
OPENAI_API_KEY=sk-proj-...

3. Install dependencies and start dev server:

npm install

npm run dev  # Open http://localhost:3000

Testing the Flow

Step 1: Submit product and user information

Enter an Amazon product URL (must include ASIN) and generate mock data (takes 3-4 minutes).

The MockDataOrchestrator builds a realistic e-commerce simulation in your supabase project. For example:

Entity

Count

Composition

Purpose

Products

11

1 real (RapidAPI) + 6 similar (LLM) + 4 diverse (LLM)

Market context for ProductContextAgent

Users

13-25

1 main user + 12-24 mock personas (varied ages, locations, purchase patterns)

Behavioral diversity for CustomerContextAgent

Reviews

30-100+

10-15 real (RapidAPI) + 20-85 LLM-generated (70% positive, 20% neutral, 10% negative)

Signal for ProductContextAgent analysis

Transactions

80-170+

Each review → 1 transaction; additional no-review purchases (40% sparsity); 1 "current" delivery (triggers survey)

Realistic purchase patterns

Embeddings

200-300

All entities → 1536-dim vectors (batch parallel via text-embedding-3-small)

Semantic similarity search

Step 2: Launch the survey

Click "Start Survey" and wait 3-5 seconds.

What's happening behind the scenes:

Step 3: Answer questions (MCQ-based)

Answer 10-12 adaptive MCQ questions. Each response triggers follow-up questions that probe deeper into your concerns (e.g., "battery life" → "how long does it last?" → "does this meet expectations?").

Step 4: Generate and submit review

ReviewGenAgent synthesizes your MCQ responses into 3 natural language review variations (different star ratings, same sentiment). Pick one, optionally edit, and submit.

Real-World Impact Simulations


Note: These are projections based on industry benchmarks and reasonable assumptions. Actual results will vary significantly based on implementation, industry vertical, and user behavior. The scenarios below illustrate potential impact, not guaranteed outcomes.

Scenario 1: Mid-Market E-Commerce Business ($8M revenue)

Baseline (Traditional Reviews):

With Survey Sensei (projected):

Potential financial impact:

ROI (assuming full impact realization):

Scenario 2: Survey-as-a-Service for Small Businesses

Potential business model:

Projected unit economics:

Customer acquisition (estimated):

From Ideation to Market Adoption: A Potential 4-Month Journey


Month 0: Current MVP State

Month 1: Production Hardening + Initial Testing

Infrastructure improvements:

Batch data pipelines:

Early validation:

Month 2: Platform Integrations (If Early Metrics Look Promising)

Service layer architecture:

# Embedded API integration example
@app.route('/webhooks/order_delivered', methods=['POST'])
def handle_order_delivered(order_data):
    response = requests.post('https://api.surveysensei.io/v1/surveys/generate', json={
        'transaction_id': order_data['id'],
        'user_id': order_data['customer_id'],
        'product_id': order_data['product_id'],
        'user_context': {...},
        'product_context': {...}
    })
    survey_url = response.json()['survey_url']
    send_email(to=order_data['customer']['email'], body=survey_url)

Initial connectors:

Early data patterns (if scale permits):

Month 3: Analytics Layer + Scale Testing

Basic intelligence features:

Scale validation:

Market positioning refinement:

Month 4+: Iterative Improvement

Realistic expectations:

Competitive considerations:

What needs ongoing work:

Conclusion


Survey Sensei demonstrates a practical path toward better customer feedback by combining modern AI capabilities with structured data collection:

What we built:

Improvements over traditional reviews:

What needs validation:

The system works today. Clone the repo, run the setup, and test a survey in 10 minutes. The architecture shows how multi-agent patterns handle complex, context-dependent workflows—not just for surveys, but for any system requiring personalization at scale.

If you're building customer feedback systems, recommendation engines, or personalization tools, this architecture offers a concrete reference implementation. The pattern (context gathering → adaptive decision-making → personalized output) generalizes well:

The shift toward specialized agents collaborating on tasks represents a practical middle ground between monolithic models and over-engineered microservices. It's early, but the economics and technical patterns are sound enough to build on.

References and Further Reading


Questions? Ideas? Feedback?

Academic Papers

Industry Reports

Multi-Agent Systems Resources

Technical Documentation

All code examples and architecture diagrams are from the Survey Sensei codebase.