This is a simplified guide to an AI model called Step-3.5-Flash maintained by stepfun-ai. If you like these kinds of analysis, join AIModels.fyi or follow us on Twitter.

Model overview

Step-3.5-Flash is an open-source foundation model from stepfun-ai engineered for frontier reasoning and agentic capabilities. Built on a sparse Mixture of Experts architecture, it activates only 11B of its 196B parameters per token, making it far more efficient than comparable models. This intelligent parameter allocation allows it to match the reasoning depth of top-tier proprietary systems while maintaining rapid inference speeds. Compared to similar models like step3 and deepseek-v3, this variant focuses on speed and accessibility rather than multimodal capabilities.

Model inputs and outputs

Step-3.5-Flash accepts text prompts and can engage in complex reasoning tasks, code generation, and agentic workflows. The model processes queries through a 45-layer transformer with a 256K context window, allowing it to handle lengthy documents and codebases. It outputs coherent text responses with support for multi-step reasoning and technical problem-solving.

Inputs

Outputs

Capabilities

The model generates text at 100-300 tokens per second in typical usage, with peaks reaching 350 tokens per second for single-stream coding tasks. This speed comes from its 3-way Multi-Token Prediction head, which predicts four tokens simultaneously in a single forward pass. For code-related work, it achieves 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0, demonstrating strong performance on long-horizon software engineering tasks. The hybrid attention mechanism—combining three Sliding Window Attention layers with full-attention layers—maintains consistent performance across the entire 256K context window without proportional increases in computational cost.

What can I use it for?

Step-3.5-Flash serves software developers building agentic systems that require rapid decision-making and code synthesis. Companies can deploy it locally for privacy-critical applications on high-end consumer hardware like Mac Studio M4 Max or NVIDIA systems. Research teams benefit from the open architecture for extending or fine-tuning reasoning capabilities. The model handles complex information retrieval tasks through its agentic abilities, scoring 88.2 on Agency-Bench and 83.7 on xbench-DeepSearch. You can integrate it via cloud APIs through OpenRouter or the StepFun platform for straightforward API access without managing infrastructure yourself.

Things to try

Test the model's reasoning on competition-level mathematics—it scores 97.3% on AIME 2025 and 98.4% on HMMT 2025, making it suitable for tutoring systems or automated grading. Try deploying it as a local coding copilot for refactoring large legacy codebases, leveraging its efficient context handling to process entire files without token budget constraints. Experiment with agentic workflows where the model autonomously browses the web, executes terminal commands, and manages complex research tasks. The model's sparse activation pattern means you can run sophisticated reasoning on modest hardware, enabling personal AI projects that would otherwise require expensive cloud compute.