Skip to content
thinking-small:-how-small-language-models-could-lessen-the-ai-energy-burden

Thinking small: How small language models could lessen the AI energy burden

Note to readers: This series of articles focuses on researchers whose work improves efficiency, addresses concerns, or offers alternative solutions to some of the pressing issues created by data centers.

With the ubiquity of ChatGPT, Claude, Gemini and xAI, it can be easy to think that the entire artificial ingelligence (AI) industry is made up of these large language models (LLMs). But AI predates many of these systems and is far more expansive than just LLMs, including machine learning, deep learning, computer vision, robotics, and more.

Even when it comes to language models themselves, the colossal, energy- and resource-intensive ones aren’t the only options available. According to Virginia Tech researchers, for many industries, small language models (SLMs) may offer a host of advantages. They are open source, requiring no token cost. They can be hosted privately on a local network, removing a level of cybersecurity threat. They don’t rely on the cloud, making them more reliable. This locality also means they are low latency. And they can be fine-tuned to the particular needs of the user.

“If they are fine-tuned for specific domain tasks. They can actually perform better in terms of effectiveness, efficiency, reliability, and safety because they are optimized for that particular problem rather than trying to do everything,” said Xuan Wang, assistant professor of computer science at the Sanghani Center for Artificial Intelligence and Data Analytics at Virginia Tech’s Institute for Advanced Computing in Alexandria. She has published a pair of recent papers with teams working on advancing SLMs as alternative language models.

But the energy savings might be the most stark, reducing compute and energy use by one to two orders of magnitude, or 10-100 times. Small models use far fewer parameters and typically run on a single GPU or workstation, which also lowers memory requirements substantially. 

“The biggest savings come from reduced hardware infrastructure, lower energy consumption, and the ability to deploy models locally, such as on institutional servers, research instruments, or edge devices, without requiring large data-center-scale computing resources,” said Wang.

Wang is currently collaborating with Children’s National Hospital in Washington, D.C., and Seattle Children’s Hospital, where recent work has demonstrated that a carefully fine-tuned SLM can work much better than large GPT models for triage in emergency departments. As a next step, they are working on embedding a SLM agent, an approach that integrates much better with existing health care workflow systems than GPT models, which often raise many safety and privacy concerns.

Fellow researcher Tu Vu, assistant professor in the Department of Computer Science and core faculty at the Sanghani Center, focuses more on efficient model development through capability transfer. This is a framework that treats model capabilities as reusable components, rather than something that must be retrained from scratch for every model.

“In traditional pipelines, each new model version requires its own post-training process, even when the target capabilities are similar,” said Vu. “In contrast, capability transfer allows us to extract those capabilities once and apply them to other models.”

That means reducing repeated post-training, a compute-intensive stage in the model lifecycle which is often repeated across model versions. Weight-based transfer reuses previously learned parameter updates and activation-based transfer reuses steering directions that can be applied to the model’s internal representations. Instead of treating each model as an isolated system that must be trained independently, it allows capabilities to be extracted, represented compactly, and applied across models.

“When combined with approaches such as distillation and compression, they contribute to a more sustainable ecosystem where both compute usage and energy consumption are reduced over the full lifecycle of model development and deployment,” said Vu.

While SLMs naturally have less general knowledge and reasoning capabilities than LLMs, Wang and Vu hope that by making them more specialized and collaborative, they could offer a far more economical way for many industries to implement AI, perhaps avoiding the need for data centers altogether.

Other stories in this series

Supply, demand, and the future of data centers

colind88

Back To Top