The Future of L&D Is Conversational: How AI Avatars in XR Deliver Personalized, Scalable Training

Convai Team

September 21, 2025

TL;DR:
Convai’s latest platform turns Learning & Development into interactive roleplay with AI‑powered avatars that see, listen, and act inside 2D video-call and 3D XR simulations. L&D teams can build no‑code training, capture custom task animations with a Meta Quest headset, run everything in the browser via pixel streaming, evaluate performance with multimodal analytics, and deploy globally with 65+ languages (expanding). Real‑world showcases include a NASA lunar training simulation (via partner Buendea).

Why conversational learning now? (and why it works!)

1:1 tutoring effect: Conversational learning increases retention and insight compared with one‑way, passive content.
Automated feedback: The avatar evaluates and responds in real time—reinforcement learning without scheduling coaches.
Personalization at scale: Thousands of employees can roleplay scenarios with consistent quality and instant feedback.
Lower cost: Replace or augment live roleplay sessions with always‑available AI characters.

In the talks, Purnendu highlights the performance benefits of one‑on‑one, conversational learning and the ability to automate feedback loops so learners can practice repeatedly until mastery.

From soft skills to hard skills—on one platform

Soft‑skills training (2D, video‑call style):
Customer service, sales, leadership, difficult conversations, expert assistance, and even recruitment roleplays. Learners interact via voice or text; avatars respond in natural speech across 65+ languages with many voice options.

Hard‑skills training (3D, XR/WebXR):
Hands‑on simulations inside digital twins—equipment operation, safety steps, and guided tours. Avatars can follow you, navigate the environment, and perform actions, not just talk.

What’s new for L&D teams (AWE XR 2025 announcements)

1) Avatar Studio (no‑code, browser‑based)

Craft the mind of the avatar: backstory, speaking style, narrative, connected knowledge (incl. enterprise data via MCP‑style connectors).
Choose from hundreds of avatars (incl. MetaHumans) or upload your own.
Control proactivity and engagement levels.
Publish with a unique URL and embed in your LMS, websites, or applications.
Voice + text interfaces; model‑agnostic and vendor‑agnostic (supports multiple LLM/STT/TTS providers).

2) Convai Sim (no‑code 3D simulation in the browser)

Drag‑and‑drop your avatar into a high‑fidelity 3D scene; run via cloud rendering + pixel streaming—no high‑end GPU on the learner’s device.
Learners can ask the avatar to follow, demonstrate tasks, or guide tours across scene waypoints.
Perfect for onboarding inside digital twins and for procedural or operations training.

3) XR Animation Capture App (Meta Quest)

Capture custom task animations (e.g., “pour a drink,” “turn this valve”) using hand/body tracking (OpenXR).
No mocap suit or studio required; record, upload, and apply to your avatar—typically in minutes, not weeks.
Cloud smoothing refines captured motion, improving realism and repeatability.

4) Multimodal evaluation, analytics, and “state of mind”

Rubric‑based scoring for soft skills: tone, brand adherence, accuracy, de‑escalation, etc.
Multimodal checks for hard skills: correct steps, order, positioning, and hand interactions.
“State of mind” visualization tracks the simulated customer’s emotion over time, so trainers see whether the learner moved the customer from angry → neutral → satisfied.

5) Agentic capabilities (for more human training partners)

Avatars see what the learner sees and listen continuously; they decide when to respond.
Proactive guidance (not just reactive Q&A).
Long‑term memory, inner monologues, and role‑aware animation selection (e.g., sit/stand/move based on context).

Several features are available now; some advanced agentic abilities are in closed beta with early access.

Real‑world showcase: NASA training with Buendea (hard skills, XR)

Convai (with partner Buendea) demonstrated a lunar digital twin where astronaut trainees practice tasks (e.g., power station repair, rock sampling). A disembodied “Jarvis‑like” AI provides objectives, step‑by‑step help, and can simulate multiplayer crew interactions. Everything runs in the browser (pixel streaming) and can go fully WebXR on headsets like Meta Quest.

Mixed Reality & deployment options (global L&D readiness)

Mixed Reality: Supports passthrough on devices like Meta Quest so avatars sit “in the room” with you.
Multilingual: 65+ languages today, expanding toward 100+ with the latest real‑time multimodal models; supports diverse voices and accents.
Enterprise‑ready: Run in Convai’s cloud or deploy on your own cloud; ISO‑compliant; model‑agnostic; integrates with Unity, Unreal Engine, and 3JS.
No heavy installs: Pixel streaming brings high‑fidelity scenes to any modern browser and WebXR devices.

L&D use cases you can launch now

Customer service de‑escalation and hospitality roleplay
Sales discovery and objection handling
Leadership & difficult conversations coaching
Onboarding inside a digital twin (guided tours, checkpoints)
Manufacturing/Safety procedure drills with captured task animations
Language learning via bilingual practice partners (speak + listen)

Check out Convai experiences to check out some of the user-generated AI avatar scenarios.

How to build an L&D scenario in 1 day

Craft the avatar’s mind (role, knowledge, tone, proactivity).
Choose/upload your avatar inside avatar studio; publish a video‑call trainer for soft skills.
For hard skills, open Convai Sim and place waypoints in your 3D scene (tour or procedure).
Use XR Animation Capture (Meta Quest) to record any task‑specific motions you need.
Publish to your LMS or share a URL; scale globally with multilingual voice.

Ready to pilot conversational L&D in a day? Book a live demo of Avatar Studio, Convai Sim, and the Quest‑based XR Animation Capture workflow.

FAQs

Q1. Can learners talk naturally to the avatar?
Yes. Interaction works with voice or text. The avatar replies with natural speech and supports many languages (65+ today).

Q2. Does this require high‑end GPUs on learner devices?
No. High‑fidelity scenes run via cloud rendering + pixel streaming in the browser (desktop, mobile, and WebXR headsets).

Q3. How do we teach the avatar domain‑specific motions?
Record them with the XR Animation Capture App on Meta Quest (OpenXR). Upload and apply to your avatar—no mocap suit needed.

Q4. Can we integrate our knowledge base and tools?
Yes. Connect internal data, narratives, and services (e.g., via MCP‑style connectors). The platform is model‑agnostic.

Q5. How are learners evaluated?
With multimodal analytics and custom rubrics for soft and hard skills. “State of mind” shows whether the learner improved the simulated customer’s emotions.

Q6. Does it support Mixed Reality?
Yes. Passthrough lets avatars appear in your real environment for more authentic practice.

Q7. Can we deploy on our own cloud?
Yes. Run in Convai’s cloud or deploy privately. The platform is ISO‑compliant and built for enterprise needs.

‍