Review Shengming Zhang 1 , MSc, BSc ; Chaohai Zhang 1 , BSc ; Jiaxin Zhang 1, 2 , PhD 1School of Automation and Intelligent Manufacturing, Southern University of Science and Technology, Shenzhen, Guangdong, China 2Guangdong Provincial Key Laboratory of Fully Actuated System Control Theory and Technology, School of Automation and Intelligent Manufacturing, Southern University of Science

Chemists create mosaic of AI synthesis knowledge
It’s a common enough scenario in many synthesis labs: you know what starting material you have, and you know what product you need but not quite what reaction conditions will get you there. Unless you’re fortunate enough to know someone with deep expertise in the best reaction for the job, you’re likely going to spend a lot of time combing through the literature.
In the interest of saving chemists time and effort, a growing contingent of researchers are working to train large language models (LLMs) to dispense synthesis advice. “When you analyze data with artificial intelligence techniques, you analyze it in a way that is different from human thinking,” which can yield unexpected and valuable new ideas, says organic chemist Timothy Newhouse of Yale University.
Newhouse and his team, in collaboration with the group of computational chemist Victor S. Batista, recently unveiled a new framework called MOSAIC, short for Multiple Optimized Specialists for AI-assisted Chemical Prediction (Nature 2026, DOI: 10.1038/s41586-026-10131-4).
Instead of one big LLM, MOSAIC is a collection of 2,498 smaller LLMs. Each model is trained on a specific subset of more than 1 million reaction procedures pulled from patent literature. For example, one expert model might specialize in Suzuki coupling. Another might be the go-to for olefin metathesis.
The framework, based on Meta’s Llama 3.1 platform, was developed primarily by Batista’s former graduate student Haote Li, now working on chemical AI for Merck. Sumon Sarkar, a postdoctoral fellow in Newhouse’s lab, led the experimental validation.
The models don’t actually know the reaction names, of course, Sarkar says—just what the transformation is and how it’s represented in patent literature. So reactions with distinct mechanisms that accomplish the same type of bond formation might be grouped together. “It doesn’t understand the mechanism of the transformation. But by looking at millions of procedures, it kind of mimics the understanding.”
Given the structures of starting materials and desired products in Simplified Molecular Input Line Entry System (SMILES) notation, MOSAIC will pass the query on to a few of its specialist models—sort of like a journal editor assigning a paper to reviewers whose expertise matches the manuscript’s contents.
The specialist models will then come up with written protocols for how one might accomplish the transformation: what solvents and reagents to use in what amounts, what temperature to run the reaction at and for how long, and even how to purify the product.
Because it’s only querying relevant areas of chemical space, MOSAIC gives more-accurate recommendations while requiring less computing power, Batista says. The output also includes a confidence score that reflects how far the prediction is from the “center” of the model’s expertise.
The researchers tested MOSAIC on reactions from the literature that had not been in the training data. The AI models’ solvent and reagent predictions exactly matched the known procedure about a quarter of the time; including partial matches raised the match accuracy to near 50%. Including more models and predictions improved the likelihood of seeing a match.
The researchers also applied MOSAIC to 37 reactions that didn’t have direct precedents in the literature. The AI tool’s top-ranked prediction worked in 35 of those cases.
“It’s a terrific paper, really well done” in both concept and execution, says Gabe Gomes of Carnegie Mellon University. Gomes also works on LLMs for chemistry; his lab developed an AI lab assistant called Coscientist in 2023. MOSAIC’s success rate probably doesn’t beat a super-seasoned chemist, but technology is just going to keep getting better, he says. “This is the worst it’s ever going to be.”
The MOSAIC approach has built-in flexibility, enabling it to incorporate new models as new areas of chemistry develop, Li says. For example, the platform doesn’t yet have a lot of photochemistry knowledge. But as more patents involving photochemistry are filed, that could change.
Batista and Li say their next steps will include integrating MOSAIC with synthesis planning and ideally incorporating lab automation. “The future will definitely be more intelligent and more automated,” Li says
Chemical & Engineering News ISSN 0009-2347 Copyright © 2026 American Chemical Society
