We are excited to share the primary fashions within the Llama 4 herd can be found at this time in Azure AI Foundry and Azure Databricks, which permits individuals to construct extra customized multimodal experiences. These fashions from Meta are designed to seamlessly combine textual content and imaginative and prescient tokens right into a unified mannequin spine. This modern strategy permits builders to leverage Llama 4 fashions in purposes that demand huge quantities of unlabeled textual content, picture, and video knowledge, setting a brand new precedent in AI improvement.
We are excited to share the primary fashions within the Llama 4 herd can be found at this time in Azure AI Foundry and Azure Databricks, which permits individuals to construct extra customized multimodal experiences. These fashions from Meta are designed to seamlessly combine textual content and imaginative and prescient tokens right into a unified mannequin spine. This modern strategy permits builders to leverage Llama 4 fashions in purposes that demand huge quantities of unlabeled textual content, picture, and video knowledge, setting a brand new precedent in AI improvement.
Today, we’re bringing Meta’s Llama 4 Scout and Maverick fashions into Azure AI Foundry as managed compute choices:
- Llama 4 Scout Models
- Llama-4-Scout-17B-16E
- Llama-4-Scout-17B-16E-Instruct
- Llama 4 Maverick Models
- Llama 4-Maverick-17B-128E-Instruct-FP8
Azure AI Foundry is designed for multi-agent use instances, enabling seamless collaboration between totally different AI brokers. This opens up new frontiers in AI purposes, from advanced problem-solving to dynamic job administration. Imagine a workforce of AI brokers working collectively to research huge datasets, generate inventive content material, and supply real-time insights throughout a number of domains. The potentialities are infinite.

To accommodate a spread of use instances and developer wants, Llama 4 fashions are available each smaller and bigger choices. These fashions combine mitigations at each layer of improvement, from pre-training to post-training. Tunable system-level mitigations defend builders from adversarial customers, empowering them to create useful, secure, and adaptable experiences for his or her Llama-supported purposes.
Llama 4 Scout fashions: Power and precision
We’re sharing the primary fashions within the Llama 4 herd, which is able to allow individuals to construct extra customized multimodal experiences. According to Meta, Llama 4 Scout is among the greatest multimodal fashions in its class and is extra highly effective than Meta’s Llama 3 fashions, whereas becoming in a single H100 GPU. And Llama4 Scout will increase the supported context size from 128K in Llama 3 to an industry-leading 10 million tokens. This opens up a world of potentialities, together with multi-document summarization, parsing intensive consumer exercise for customized duties, and reasoning over huge codebases.
Targeted use instances embody summarization, personalization, and reasoning. Thanks to its lengthy context and environment friendly dimension, Llama 4 Scout shines in duties that require condensing or analyzing intensive info. It can generate summaries or stories from extraordinarily prolonged inputs, personalize its responses utilizing detailed user-specific knowledge (with out forgetting earlier particulars), and carry out advanced reasoning throughout massive information units.
For instance, Scout might analyze all paperwork in an enterprise SharePoint library to reply a particular question or learn a multi-thousand-page technical handbook to offer troubleshooting recommendation. It’s designed to be a diligent “scout” that traverses huge info and returns the highlights or solutions you want.
Llama 4 Maverick fashions: Innovation at scale
As a general-purpose LLM, Llama 4 Maverick accommodates 17 billion lively parameters, 128 specialists, and 400 billion whole parameters, providing prime quality at a lower cost in comparison with Llama 3.3 70B. Maverick excels in picture and textual content understanding with assist for 12 languages, enabling the creation of subtle AI purposes that bridge language limitations. Maverick is good for exact picture understanding and inventive writing, making it well-suited for basic assistant and chat use instances. For builders, it provides state-of-the-art intelligence with excessive velocity, optimized for greatest response high quality and tone.
Targeted use instances embody optimized chat situations that require high-quality responses. Meta fine-tuned Llama 4 Maverick to be a superb conversational agent. It is the flagship chat mannequin of the Meta Llama 4 household—consider it because the multilingual, multimodal counterpart to a ChatGPT-like assistant.
It’s notably well-suited for interactive purposes:
- Customer assist bots that want to know pictures customers add.
- AI inventive companions that may talk about and generate content material in numerous languages.
- Internal enterprise assistants that may assist staff by answering questions and dealing with wealthy media enter.
With Maverick, enterprises can construct high-quality AI assistants that converse naturally (and politely) with a world consumer base and leverage visible context when wanted.

Architectural improvements in Llama 4: Multimodal early-fusion and MoE
According to Meta, two key improvements set Llama 4 aside: native multimodal assist with early fusion and a sparse Mixture of Experts (MoE) design for effectivity and scale.
- Early-fusion multimodal transformer: Llama 4 makes use of an early fusion strategy, treating textual content, pictures, and video frames as a single sequence of tokens from the beginning. This permits the mannequin to know and generate numerous media collectively. It excels at duties involving a number of modalities, akin to analyzing paperwork with diagrams or answering questions on a video’s transcript and visuals. For enterprises, this enables AI assistants to course of full stories (textual content + graphics + video snippets) and supply built-in summaries or solutions.
- Cutting-edge Mixture of Experts (MoE) structure: To obtain good efficiency with out incurring prohibitive computing bills, Llama 4 makes use of a sparse Mixture of Experts (MoE) structure. Essentially, which means that the mannequin contains quite a few knowledgeable sub-models, known as “experts,” with solely a small subset lively for any given enter token. This design not solely enhances coaching effectivity but in addition improves inference scalability. Consequently, the mannequin can deal with extra queries concurrently by distributing the computational load throughout numerous specialists, enabling deployment in manufacturing environments with out necessitating massive single-instance GPUs. The MoE structure permits Llama 4 to broaden its capability with out escalating prices, providing a major benefit for enterprise implementations.
Commitment to security and greatest practices
Meta constructed Llama 4 with the perfect practices outlined of their Developer Use Guide: AI Protections. This contains integrating mitigations at every layer of mannequin improvement from pre-training to post-training and tunable system-level mitigations that defend builders from adversarial assaults. And, by making these fashions out there in Azure AI Foundry, they arrive with confirmed security and safety guardrails builders come to anticipate from Azure.
We empower builders to create useful, secure, and adaptable experiences for his or her Llama-supported purposes. Explore the Llama 4 fashions now within the Azure AI Foundry Model Catalog and in Azure Databricks and begin constructing with the newest in multimodal, MoE-powered AI—backed by Meta’s analysis and Azure’s platform power.
The availability of Meta Llama 4 on Azure AI Foundry and thru Azure Databricks provides prospects unparalleled flexibility in selecting the platform that most accurately fits their wants. This seamless integration permits customers to harness superior AI capabilities, enhancing their purposes with highly effective, safe, and adaptable options. We are excited to see what you construct subsequent.