Model overview

Step-3.5-Flash is an open-source foundation model from stepfun-ai engineered for frontier reasoning and agentic capabilities. Built on a sparse Mixture of Experts architecture, it activates only 11B of its 196B parameters per token, making it far more efficient than comparable models. This intelligent parameter allocation allows it to match the reasoning depth of top-tier proprietary systems while maintaining rapid inference speeds. Compared to similar models like step3 and deepseek-v3, this variant focuses on speed and accessibility rather than multimodal capabilities.

Model inputs and outputs

Step-3.5-Flash accepts text prompts and can engage in complex reasoning tasks, code generation, and agentic workflows. The model processes queries through a 45-layer transformer with a 256K context window, allowing it to handle lengthy documents and codebases. It outputs coherent text responses with support for multi-step reasoning and technical problem-solving.

Inputs

Text prompts of any length up to 256K tokens

Conversation history for maintaining context across multiple exchanges

Code snippets or technical specifications for analysis and generation

Complex multi-part questions requiring reasoning chains

Outputs

Reasoned text responses with step-by-step explanations

Generated code for software engineering tasks

Agent-driven actions including web browsing and task execution

Mathematical proofs and solutions with detailed working

Capabilities

The model generates text at 100-300 tokens per second in typical usage, with peaks reaching 350 tokens per second for single-stream coding tasks. This speed comes from its 3-way Multi-Token Prediction head, which predicts four tokens simultaneously in a single forward pass. For code-related work, it achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0, demonstrating strong performance on long-horizon software engineering tasks. The hybrid attention mechanism—combining three Sliding Window Attention layers with full-attention layers—maintains consistent performance across the entire 256K context window without proportional increases in computational cost.

What can I use it for?

Step-3.5-Flash serves software developers building agentic systems that require rapid decision-making and code synthesis. Companies can deploy it locally for privacy-critical applications on high-end consumer hardware like Mac Studio M4 Max or NVIDIA systems. Research teams benefit from the open architecture for extending or fine-tuning reasoning capabilities. The model handles complex information retrieval tasks through its agentic abilities, scoring 88.2 on Agency-Bench and 83.7 on xbench-DeepSearch. You can integrate it via cloud APIs through OpenRouter or the StepFun platform for straightforward API access without managing infrastructure yourself.

Things to try

Test the model's reasoning on competition-level mathematics—it scores 97.3% on AIME 2025 and 98.4% on HMMT 2025, making it suitable for tutoring systems or automated grading. Try deploying it as a local coding copilot for refactoring large legacy codebases, leveraging its efficient context handling to process entire files without token budget constraints. Experiment with agentic workflows where the model autonomously browses the web, executes terminal commands, and manages complex research tasks. The model's sparse activation pattern means you can run sophisticated reasoning on modest hardware, enabling personal AI projects that would otherwise require expensive cloud compute.

A 196B Open Model That Runs Like It’s 11B: Step-3.5-Flash’s Trick

Written by @aimodels44 | Published on 2026-02-13T00:44:59.561Z

TL;DR →

Step-3.5-Flash is an open MoE reasoning model (196B, 11B active) with 256K context, agentic workflows, and blazing-fast code performance.

This is a simplified guide to an AI model called Step-3.5-Flash maintained by stepfun-ai. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.

Model overview

Model inputs and outputs

Inputs

Text prompts of any length up to 256K tokens
Conversation history for maintaining context across multiple exchanges
Code snippets or technical specifications for analysis and generation
Complex multi-part questions requiring reasoning chains

Outputs

Reasoned text responses with step-by-step explanations
Generated code for software engineering tasks
Agent-driven actions including web browsing and task execution
Mathematical proofs and solutions with detailed working

Capabilities

What can I use it for?

Things to try

Written by

@aimodels44 | Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

Topics & Tags

This story on HackerNoon has a decentralized backup on Sia.

Meta Data: 📄