Hello AI Enthusiasts!


Welcome to the Twenty-Second edition of "This Week in AI Engineering"!


This week, Fathom R1 14B cracks one of the world’s toughest exams while outperforming OpenAI’s o3-mini, Google open-sources their entire DeepSearch stack, NVIDIA releases Nemotron Research Reasoning Qwen 1.5B, Microsoft introduces Sora-style text-to-video generation in Bing, OpenAI debuts Audio Endeavor and Audio Voyager, and the Agents SDK in TypeScript drops with real-time streaming capabilities.


With this, we'll also explore some under-the-radar tools that can supercharge your development workflow.


Fathom R1 14B

Submitted as a proposal under India’s National AI Mission, Fathom R1 14B is a 14 billion-parameter reasoning model developed by Fractal AI. Despite its relatively modest parameter count, it has already made headlines by cracking the IIT JEE Advanced, arguably the most challenging college entrance exam globally, on its first attempt. To gauge its global reasoning prowess, the Fathom team benchmarked it on Olympiad-grade math contests: it scored 52.71 percent on AIME 25 and 35.26 percent on HMMT 25, surpassing both OpenAI’s o3-mini and Light R1 14B. Remarkably, all these results came without any retries or a massive inference stack.


Lean Context Window and Low Budget


Open-Source Commitment


Key Use Cases


Google’s Deep Resarch Stack Is Open Source


Google has open-sourced its entire DeepSearch stack, the same system it uses internally to perform ultra-fast multimodal document search. This stack comprises a modified ScaNN indexer, a 50,000-piece SentencePiece tokenizer, and T5-based dual encoders for result ranking.


Ultra-Low Latency at Scale


Modular & Customizable Architecture


Potential Impact


Nvidia’s New Advanced Reasoning Model


NVIDIA’s new Nemotron Research Reasoning Qwen 1.5B is a 1.5 billion-parameter open-weight model specifically fine-tuned for advanced reasoning tasks, spanning math, coding, science, and logic puzzles. It adopts extended reinforcement learning schedules, entropy collapse prevention, DAPO optimization, and KL regularization to unlock deeper reasoning strategies.


Prolonged Reinforcement Learning Innovations


Benchmark Gains Over DeepSeek R1 1.5B


Research-Only Release


Sora-Style Text-to-Video Generation in Bing


Microsoft has integrated Sora-style text-to-video generation directly into Bing, for free. Users type a prompt such as “futuristic skyline with flying cars,” and within 15 seconds they receive a 5-second, 1080p video clip. Under the hood, this service leverages a Variational Autoencoder (VAE) with temporal diffusion and frame-level tokenization to ensure coherent motion and visual fidelity.


Core Technical Highlights


Key Use Cases


OpenAI’s Newest Audio Models


OpenAI’s latest audio models, Audio Endeavor and Audio Voyager, push the boundaries of what’s possible in long-form audio understanding and real-time voice applications.


Audio Endeavor


Audio Voyager


Developer Implications


OpenAI Agents SDK in TypeScript


OpenAI’s new Agents SDK for TypeScript introduces a powerful framework for building real-time, multi-agent workflows and voice agents, complete with streaming insights, guardrails, and human-in-the-loop support.


RealtimeAgent: Streaming Actions & Thoughts


Prebuilt Agents & Extensibility


Advanced Features


Key Use Cases


Tools & Releases YOU Should Know About



And that wraps up this issue of "This Week in AI Engineering."


Thank you for tuning in! Be sure to share this newsletter with your fellow AI enthusiasts and subscribe for more weekly updates.


Until next time, happy building!