CCS Accelerates Generative AI With Reinforcement Learning On Adaptive Engine

Blog posts

May 14, 2026

Research

Authors

No items found.

Editors

No items found.

Acknowledgements

There is an ongoing opportunity to improve patient service by reducing wait times and increasing overall service reliability, particularly during unpredictable surges of peak demand. Patient service teams often struggle with availability and consistency when volume spikes, and are looking for tools to augment and enable their workforce, while also enabling patients to solve certain queries in self-service fashion.

CCS is a chronic care management organization that provides direct-to-patient medical supplies and services to those managing diabetes in the United States. Supplies include Continuous Glucose Monitors (CGMs), insulin pumps, and Blood Glucose Monitors (BGMs); while CCS’s clinical education, monitoring, and coaching are critical services to help drive therapy adherence. In 2025, CCS started a collaboration with Deloitte and Adaptive ML to develop AI systems to transform operations through AI across the enterprise, beginning with patient support.

This is where a clear strategic commitment toward intelligent automation helps to streamline operational overhead/costs and ensures a high-level of always-on service that patients can rely on. In this case study, we present latency optimization results obtained on a function calling aspect of the project. This setup is not necessarily representative of future production deployments.

Why is function calling important in patient support?

Enterprise patient support AI agents must go beyond natural language interaction to reliably execute actions across business systems. Function calling is the foundational capability enabling large language models (LLMs) to do more than simply talk. It allows these models to determine when to invoke an external API, and to generate the correct parameters for that invocation. Function calling allows patient support AI assistants to query external systems such as knowledge bases, authentication mechanisms, CRMs, ERPs and more.

Challenges of function calling in production environments

Deploying function calling at scale introduces several challenges. Though proprietary models are good baselines for function calling, they are costly and introduce higher inference latency, which can be limiting for real-time patient interactions. In contrast, open-source base models offer improved latency and cost efficiency but frequently exhibit lower function-calling accuracy. These limitations can impede the reliability of patient support workflows and increase operational overhead when deployed in production environments.

How can Reinforcement Fine-Tuning and deployment on Adaptive Engine bridge the gap between automated efficiency and the accuracy required for healthcare?

To address these challenges, Adaptive ML applied reinforcement learning–based fine-tuning using Adaptive Engine to a compact Llama 3.2 3B model. This approach significantly improved function-calling accuracy, enabling the smaller model to match the accuracy of a proprietary model baseline.

The model was then deployed on a single-H100 Amazon EC2 p5.4xlarge instance, under ML capacity block procurement. This recently-released form factor allowed optimized placement and costs, enabling client-side LLM inference average latency of 230 milliseconds (160ms server-side), over 90% smaller than the proprietary model baseline.‍

Conclusion

This unique collaboration between CCS, Deloitte and Adaptive ML demonstrates that reinforcement fine-tuning combined with targeted infrastructure choices can enable open-source LLMs to meet the performance and reliability requirements of enterprise patient support.

By combining RL-based fine-tuning on CCS-specific knowledge with function calling to trusted CCS resources, and by deploying it via Adaptive Engine with Meta Llama on AWS, Adaptive ML, CCS and Deloitte have created a cost-efficient, portable, low-latency AI agent proof of concept that maintains high function-calling quality while delivering a unique, competitive, and genuinely innovative solution for CCS patients.

Adaptive Engine Adapt Evaluate Serve

Use Cases RAG Text-to-SQL Customer Support

Company Technology About Blog

Socials LinkedIn Twitter YouTube