Qwen3.5-35B-A3B Uncensored Guide: Features, Capabilities, and Setup

This is a simplified guide to an AI model called Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive maintained by HauhauCS.

Model overview

Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive is an uncensored version of the base Qwen3.5-35B-A3B model maintained by HauhauCS. This model removes refusal mechanisms while maintaining the full capabilities of the original architecture. The aggressive variant achieves zero refusals across extensive testing while preserving dataset integrity and model functionality. For users seeking a less aggressive approach, the 27B variant offers comparable performance at a smaller scale, and the 9B and 4B models provide options for resource-constrained environments.

Model inputs and outputs

The model accepts text prompts and optionally images or video frames, producing coherent text responses across 201 languages. It features a 262K native context window extendable to 1M tokens, supporting both single and multi-token predictions. The implementation includes specialized thinking mode and standard modes with configurable generation parameters.

Inputs

Text prompts in any of the 201 supported languages
Images and video frames for multimodal understanding (requires mmproj file)
Structured instructions with temperature, sampling parameters, and context length specifications

Outputs

Text completions in the requested language or format
Reasoning traces when using thinking mode
Multimodal responses incorporating analysis of provided images or video content

Capabilities

The model operates with 35B total parameters using a mixture-of-experts architecture where 3B parameters activate per forward pass. It combines gated DeltaNet linear attention with full softmax attention in a 3:1 ratio across 40 layers, with 256 experts routed 8 per token plus one shared expert. This hybrid approach preserves thinking capabilities essential for reasoning tasks while maintaining efficiency. The model handles extended contexts through YaRN scaling and supports multi-token prediction for faster inference.

What can I use it for?

Applications include creative writing, coding assistance, technical documentation, research exploration, and content generation without content-based restrictions. The multimodal capabilities enable image analysis and video understanding tasks. Users can deploy it locally using llama.cpp, LM Studio, Jan, or koboldcpp with quantized weights ranging from 11GB to 65GB depending on precision requirements. The model performs well for both general conversation and specialized tasks like code generation when configured with appropriate temperature and sampling parameters.

Things to try

Configure the model in thinking mode with temperature=1.0 and top_p=0.95 for open-ended exploration and creative tasks, which leverages the internal reasoning capabilities. For precise or technical outputs, switch to non-thinking mode with lower temperature settings and use the specialized coding parameters. The importance matrix quantization applied during conversion helps preserve model quality across all compression levels, so experimenting with different GGUF quantizations can balance performance and memory usage without significant capability loss. Test the vision features by providing images alongside text prompts to evaluate multimodal reasoning on specific domains.