Announcing new fine-tuning fashions and methods in Azure AI Foundry

0
471
Announcing new fine-tuning fashions and methods in Azure AI Foundry


Today, we’re excited to announce two main enhancements to mannequin fine-tuning in Azure AI Foundry—Reinforcement Fine-Tuning (RFT) with o4-mini, coming quickly, and Supervised Fine-Tuning (SFT) for the 4.1-nano mannequin, out there now.

Today, we’re excited to announce three main enhancements to mannequin fine-tuning in Azure AI Foundry—Reinforcement Fine-Tuning (RFT) with o4-mini (coming quickly), Supervised Fine-Tuning (SFT) for the GPT-4.1-nano and Llama 4 Scout mannequin (out there now). These updates mirror our continued dedication to empowering organizations with instruments to construct extremely custom-made, domain-adapted AI methods for real-world affect. 

With these new fashions, we’re unblocking two main avenues of LLM customization: GPT-4.1-nano is a robust small mannequin, perfect for distillation, whereas o4-mini is the primary reasoning mannequin you may fine-tune, and Llama 4 Scout is a best-in-class open supply mannequin. 

Reinforcement Fine-Tuning with o4-mini 

Reinforcement Fine-Tuning introduces a brand new degree of management for aligning mannequin conduct with complicated enterprise logic. By rewarding correct reasoning and penalizing undesirable outputs, RFT improves mannequin decision-making in dynamic or high-stakes environments.

Coming quickly for the o4-mini mannequin, RFT unlocks new potentialities to be used circumstances requiring adaptive reasoning, contextual consciousness, and domain-specific logic—all whereas sustaining quick inference efficiency.

Real world affect: DraftWise 

DraftWise, a authorized tech startup, used reinforcement fine-tuning (RFT) in Azure AI Foundry Models to reinforce the efficiency of reasoning fashions tailor-made for contract technology and evaluation. Faced with the problem of delivering extremely contextual, legally sound recommendations to legal professionals, DraftWise fine-tuned Azure OpenAI fashions utilizing proprietary authorized knowledge to enhance response accuracy and adapt to nuanced person prompts. This led to a 30% enchancment in search outcome high quality, enabling legal professionals to draft contracts quicker and deal with high-value advisory work. 

Reinforcement fine-tuning on reasoning fashions is a possible recreation changer for us. It’s serving to our fashions perceive the nuance of authorized language and reply extra intelligently to complicated drafting directions, which guarantees to make our product considerably extra helpful to legal professionals in actual time.

—James Ding, founder and CEO of DraftWise.

When must you use Reinforcement Fine-Tuning?

Reinforcement Fine-Tuning is greatest suited to use circumstances the place adaptability, iterative studying, and domain-specific conduct are important. You ought to contemplate RFT in case your state of affairs includes: 

  1. Custom Rule Implementation: RFT thrives in environments the place determination logic is very particular to your group and can’t be simply captured by means of static prompts or conventional coaching knowledge. It allows fashions to be taught versatile, evolving guidelines that mirror real-world complexity. 
  1. Domain-Specific Operational Standards: Ideal for eventualities the place inside procedures diverge from business norms—and the place success depends upon adhering to these bespoke requirements. RFT can successfully encode procedural variations, corresponding to prolonged timelines or modified compliance thresholds, into the mannequin’s conduct. 
  1. High Decision-Making Complexity: RFT excels in domains with layered logic and variable-rich determination timber. When outcomes rely upon navigating quite a few subcases or dynamically weighing a number of inputs, RFT helps fashions generalize throughout complexity and ship extra constant, correct choices. 

Example: Wealth advisory at Contoso Wellness 

To showcase the potential of RFT, contemplate Contoso Wellness, a fictitious wealth advisory agency. Using RFT, the o4-mini mannequin realized to adapt to distinctive enterprise guidelines, corresponding to figuring out optimum consumer interactions primarily based on nuanced patterns just like the ratio of a consumer’s internet value to out there funds. This enabled Contoso to streamline their onboarding processes and make extra knowledgeable choices quicker.

Supervised Fine-Tuning now out there for GPT-4.1-nano 

We’re additionally bringing Supervised Fine-Tuning (SFT) to the GPT-4.1-nano mannequin—a small however highly effective basis mannequin optimized for high-throughput, cost-sensitive workloads. With SFT, you may instill your mannequin with company-specific tone, terminology, workflows, and structured outputs—all tailor-made to your area. This mannequin will probably be out there for fine-tuning within the coming days. 

Why Fine-tune GPT-4.1-nano? 

  • Precision at Scale: Tailor the mannequin’s responses whereas sustaining velocity and effectivity. 
  • Enterprise-Grade Output: Ensure alignment with enterprise processes and tone-of-voice. 
  • Lightweight and Deployable: Perfect for eventualities the place latency and value matter—corresponding to customer support bots, on-device processing, or high-volume doc parsing. 

Compared to bigger fashions, 4.1-nano delivers quicker inference and decrease compute prices, making it effectively suited to large-scale workloads like: 

  • Customer assist automation, the place fashions should deal with 1000’s of tickets per hour with constant tone and accuracy. 
  • Internal information assistants that observe firm type and protocol in summarizing documentation or responding to FAQs. 

As a small, quick, however extremely succesful mannequin, GPT-4.1-nano makes a terrific candidate for distillation as effectively. You can use fashions like GPT-4.1 or o4 to generate coaching knowledge—or seize manufacturing site visitors with saved completions—and educate 4.1-nano to be simply as good!

Fine-tune gpt-4.1-nano demo in Azure AI Foundry.

Llama 4 Fine-Tuning now out there 

We’re additionally excited to announce assist for fine-tuning Meta’s Llama 4 Scout—a innovative,17 billion energetic parameter mannequin which provides an business main context window of 10M tokens whereas becoming on a single H100 GPU for inferencing. It’s a best-in-class mannequin, and extra highly effective than all earlier technology llama fashions. 

Llama 4 fine-tuning is offered in our managed compute providing, permitting you to fine-tune and inference utilizing your individual GPU quota. Available in each Azure AI Foundry and as Azure Machine Learning parts, you may have entry to extra hyperparameters for deeper customization in comparison with our serverless expertise.

Get began with Azure AI Foundry in the present day

Azure AI Foundry is your basis for enterprise-grade AI tuning. These fine-tuning enhancements unlock new frontiers in mannequin customization, serving to you construct clever methods that assume and reply in ways in which mirror your small business DNA.

  • Use Reinforcement Fine-tuning with o4-mini to construct reasoning engines that be taught from expertise and evolve over time. Coming quickly in Azure AI Foundry, with regional availability for East US2 and Sweden Central. 
  • Use Supervised Fine-Tuning with 4.1-nano to scale dependable, cost-efficient, and extremely custom-made mannequin behaviors throughout your group. Available now in Azure AI Foundry in North Central US and Sweden Central. 
  • Try Llama 4 scout tremendous tuning to customise a best-in-class open supply mannequin. Available now in Azure AI Foundry mannequin catalog and Azure Machine Learning. 

With Azure AI Foundry, fine-tuning isn’t nearly accuracy—it’s about belief, effectivity, and adaptableness at each layer of your stack. 

Explore additional: 

We’re simply getting began. Stay tuned for extra mannequin assist, superior tuning methods, and instruments that will help you construct AI that’s smarter, safer, and uniquely yours. 

LEAVE A REPLY

Please enter your comment!
Please enter your name here