Google Launches Gemma 4: Advanced Open Models for Agentic Workflows

Key Takeaways

  • Delivers frontier-level reasoning and agentic capabilities in an open-source format, allowing developers to build autonomous workflows locally.
  • Offers high efficiency with models optimized for everything from mobile edge devices to high-end workstations and cloud clusters.
  • Provides a commercially permissive Apache 2.0 license, ensuring flexibility for enterprise deployment and digital sovereignty.

Google DeepMind has introduced Gemma 4, its most intelligent open model family to date. Purpose-built for advanced reasoning and agentic workflows, the new models deliver an unprecedented level of intelligence-per-parameter. Since the launch of the first generation, the Gemma ecosystem has grown to include over 100,000 variants, with developers downloading the models more than 400 million times. Gemma 4 is now available under a commercially permissive Apache 2.0 license.

Versatile Performance Across Four Sizes

Gemma 4 is released in four distinct sizes to accommodate a wide range of hardware, from mobile devices to high-end workstations. The family includes the Effective 2B (E2B) and Effective 4B (E4B) models, which are optimized for edge devices, alongside a 26B Mixture of Experts (MoE) model and a 31B Dense model. The 31B model currently ranks as the number three open model on the Arena AI text leaderboard, while the 26B model holds the number six spot. These models are designed to move beyond simple chat, offering native support for multi-step planning, deep logic, and agentic workflows through function-calling and structured JSON output.

Multimodal Capabilities and Edge Integration

The entire Gemma 4 family features native support for video and image processing, including variable resolutions and tasks such as chart understanding and OCR. For mobile and IoT applications, the E2B and E4B models incorporate native audio input for speech recognition. These edge-focused models are engineered for memory and compute efficiency, allowing them to run offline with near-zero latency on hardware such as smartphones, Raspberry Pi, and the NVIDIA Jetson Orin Nano. Developers can access these capabilities through the AICore Developer Preview for Android, ensuring forward-compatibility with Gemini Nano 4.

Optimized for Research and Production

Gemma 4 is built to run efficiently on diverse hardware, including laptop GPUs and developer workstations. The 31B Dense model provides a foundation for high-quality fine-tuning, while the 26B MoE model is designed for speed, activating only 3.8 billion parameters during inference. To support a wide range of development environments, the models offer day-one integration with tools such as Hugging Face, vLLM, llama.cpp, Ollama, and NVIDIA NIM. For production-scale needs, developers can deploy Gemma 4 via Google Cloud services, including Vertex AI, Cloud Run, and GKE, or utilize hardware accelerators ranging from NVIDIA Blackwell GPUs to Trillium and Ironwood TPUs.
The models provide a context window of up to 128K for edge versions and 256K for larger models, enabling the processing of long-form content and repositories. Trained on over 140 languages, Gemma 4 is designed to support the creation of inclusive, high-performance applications for a global audience.

Comments (0)

No comments yet

Be the first to share your thoughts!