Cognizant Launches AI Training Data Services to Accelerate AI Model Development at Enterprise Scale Cognizant is a data and AI model training partner, long trusted by large digital native pioneers to help train some of the most advanced AI /ML models in the world , /PRNewswire/ -- Cognizant (NASDAQ: CTSH) today announced the launch of AI

Protein–ligand data at scale to support machine learning | Nature Reviews Chemistry
References
-
Edwards, A. M. et al. Too many roads not taken. Nature 470, 163–165 (2011).
CAS PubMed Google Scholar
-
Moustakim, M. et al. Target identification using chemical probes. Methods Enzymol. 610, 27–58 (2018).
CAS PubMed Google Scholar
-
Bond, M. J. & Crews, C. M. Proteolysis targeting chimeras (PROTACs) come of age: entering the third decade of targeted protein degradation. RSC Chem. Biol. 2, 725–742 (2021).
CAS PubMed PubMed Central Google Scholar
-
Kanev, G. K., de Graaf, C., Westerman, B. A., de Esch, I. J. P. & Kooistra, A. J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 49, D562–D569 (2021).
CAS PubMed Google Scholar
-
Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).
CAS PubMed PubMed Central Google Scholar
-
Petrović, D. et al. Virtual screening in the cloud identifies potent and selective ROS1 kinase inhibitors. J. Chem. Inf. Model. 62, 3832–3843 (2022).
PubMed Google Scholar
-
Alon, A. et al. Structures of the σ2 receptor enable docking for bioactive ligand discovery. Nature 600, 759–764 (2021).
CAS PubMed PubMed Central Google Scholar
-
Stein, R. M. et al. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature 579, 609–614 (2020).
CAS PubMed PubMed Central Google Scholar
-
Ren, F. et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat. Biotechnol. 43, 63–75 (2025).
CAS PubMed Google Scholar
-
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
CAS PubMed PubMed Central Google Scholar
-
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).
CAS PubMed Google Scholar
-
Zhu, T. et al. Hit identification and optimization in virtual screening: practical recommendations based on a critical literature analysis. J. Med. Chem. 56, 6560–6572 (2013).
CAS PubMed PubMed Central Google Scholar
-
Schneider, G. Virtual screening: an endless staircase? Nat. Rev. Drug Discov. 9, 273–276 (2010).
CAS PubMed Google Scholar
-
Carter, A. J. et al. Target 2035: probing the human proteome. Drug Discov. Today 24, 2111–2115 (2019).
CAS PubMed Google Scholar
-
Ackloo, S. et al. CACHE (Critical assessment of computational hit-finding experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6, 287–295 (2022).
PubMed PubMed Central Google Scholar
-
For chemists, the AI revolution has yet to happen. Nature 617, 438 (2023).
-
Mock, M., Edavettal, S., Langmead, C. & Russell, A. AI can help to speed up drug discovery — but only if we give it the right data. Nature 621, 467–470 (2023).
CAS PubMed Google Scholar
-
Martin, E. J. et al. All-assay-max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 novartis assays. J. Chem. Inf. Model. 59, 4450–4459 (2019).
CAS PubMed Google Scholar
-
Landrum, G. A. & Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).
CAS PubMed PubMed Central Google Scholar
-
Martin, E. J. & Zhu, X. W. Collaborative profile-QSAR: a natural platform for building collaborative models among competing companies. J. Chem. Inf. Model. 61, 1603–1616 (2021).
CAS PubMed Google Scholar
-
Zardecki, C., Dutta, S., Goodsell, D. S., Voigt, M. & Burley, S. K. RCSB Protein Data Bank: a resource for chemical, biochemical, and structural explorations of large and small biomolecules. J. Chem. Educ. 93, 569–575 (2016).
CAS Google Scholar
-
Moult, J., Pedersen, J. T., Judson, R. & Fidelis, K. A large‐scale experiment to assess protein structure prediction methods. Proteins 23, ii–v (1995).
CAS PubMed Google Scholar
-
Edfeldt, K. et al. A data science roadmap for open science organizations engaged in early-stage drug discovery. Nat. Commun. 15, 5640 (2024).
CAS PubMed PubMed Central Google Scholar
-
Thorne, N., Auld, D. S. & Inglese, J. Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Curr. Opin. Chem. Biol. 14, 315–324 (2010).
CAS PubMed PubMed Central Google Scholar
-
Clark, M. A. et al. Design, synthesis and selection of DNA-encoded small-molecule libraries. Nat. Chem. Biol. 5, 647–654 (2009).
CAS PubMed Google Scholar
-
McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).
CAS PubMed Google Scholar
-
Li, A. S. M. et al. Discovery of nanomolar DCAF1 small molecule ligands. J. Med. Chem. 66, 5041–5060 (2023).
CAS PubMed PubMed Central Google Scholar
-
Ahmad, S. et al. Discovery of a first-in-class small-molecule ligand for WDR91 using DNA-encoded chemical library selection followed by machine learning. J. Med. Chem. 66, 16051–16061 (2023).
CAS PubMed Google Scholar
-
Kelly, M. A., McLellan, T. J. & Rosner, P. J. Strategic use of affinity-based mass spectrometry techniques in the drug discovery process. Anal. Chem. 74, 1–9 (2002).
CAS PubMed Google Scholar
-
Prudent, R., Annis, D. A., Dandliker, P. J., Ortholand, J. Y. & Roche, D. Exploring new targets and chemical space with affinity selection-mass spectrometry. Nat. Rev. Chem. 5, 62–71 (2021).
CAS PubMed Google Scholar
-
Gesmundo, N. J. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228–232 (2018).
CAS PubMed Google Scholar
-
L’Heureux, A., Grolinger, K., Elyamany, H. F. & Capretz, M. A. M. Machine learning with big data: challenges and approaches. IEEE Access. 5, 7776–7797 (2017).
Google Scholar
-
Najafabadi, M. M. et al. Deep learning applications and challenges in big data analytics. J. Big Data 2, 1 (2015).
Google Scholar
-
Lo, Y. C., Rensi, S. E., Torng, W. & Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug Discov. Today 23, 15–38-1546 (2018).
Google Scholar
-
Brenner, S. & Lerner, R. A. Encoded combinatorial chemistry. Proc. Natl Acad. Sci. USA 89, 5381–5383 (1992).
CAS PubMed PubMed Central Google Scholar
-
Melkko, S., Dumelin, C. E., Scheuermann, J. & Neri, D. Lead discovery by DNA-encoded chemical libraries. Drug Discov. Today 12, 456–471 (2007).
Google Scholar
-
Gironda-Martínez, A., Donckele, E. J., Samain, F. & Neri, D. DNA-encoded chemical libraries: a comprehensive review with succesful stories and future challenges. ACS Pharmacol. Transl. Sci. 4, 1265–1279 (2021).
PubMed PubMed Central Google Scholar
-
Peterson, A. A. & Liu, D. R. Small-molecule discovery through DNA-encoded libraries. Nat. Rev. Drug Discov. 22, 699–722 (2023).
CAS PubMed PubMed Central Google Scholar
-
Lim, K. S. et al. Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. J. Chem. Inf. Model. 62, 2316–2331 (2022).
CAS PubMed PubMed Central Google Scholar
-
Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).
CAS PubMed PubMed Central Google Scholar
-
Ackloo, S. et al. A target class ligandability evaluation of WD40 repeat-containing proteins. J. Med. Chem. 68, 1092–1112 (2024).
PubMed PubMed Central Google Scholar
-
Han, S. et al. Highly selective novel heme oxygenase-1 hits found by DNA-encoded library machine learning beyond the DEL chemical space. ACS Med. Chem. Lett. 15, 1456–1466 (2024).
CAS PubMed PubMed Central Google Scholar
-
SGC and HitGen announce research collaboration focused on DNA-encoded library based drug discovery. HitGen https://www.hitgen.com/en/news-details-319.html (2023).
-
X-chem and structural genomics consortium enter into collaboration to unlock the human proteome and promote open science. X-Chem https://www.x-chemrx.com/about/news/x-chem-and-structural-genomics-consortium-enter-into-collaboration-to-unlock-the-human-proteome-and-promote-open-science/ (2023).
-
Wellnitz, J. et al. Enabling open machine learning of DNA encoded library selections to accelerate the discovery of small molecule protein binders. Preprint at https://doi.org/10.26434/chemrxiv-2024-xd385 (2024).
-
Prudent, R., Lemoine, H., Walsh, J. & Roche, D. Affinity selection mass spectrometry speeding drug discovery. Drug Discov. Today 28, 103760 (2023).
CAS PubMed Google Scholar
-
Xin, Y. et al. Affinity selection of double-click triazole libraries for rapid discovery of allosteric modulators for GLP-1 receptor. Proc. Natl Acad. Sci. USA 120, e2220767120 (2023).
CAS PubMed PubMed Central Google Scholar
-
Liu, J. et al. The omega-3 hydroxy fatty acid 7(S)-HDHA is a high-affinity PPARα ligand that regulates brain neuronal morphology. Sci. Signal. 15, eabo1857 (2022).
CAS PubMed Google Scholar
-
Zhang, P. et al. Development of an α-klotho recognizing high-affinity peptide probe from in-solution enrichment. JACS Au 4, 1334–1344 (2024).
CAS PubMed PubMed Central Google Scholar
-
Muchiri, R. N. & van Breemen, R. B. Affinity selection–mass spectrometry for the discovery of pharmacologically active compounds from combinatorial libraries and natural products. J. Mass Spectrom. 56, e4647 (2021).
CAS PubMed Google Scholar
-
Wang, X. et al. Enantioselective protein affinity selection mass spectrometry (EAS-MS). Preprint at https://doi.org/10.1101/2025.01.17.633682 (2025).
-
Paillard, G. et al. The ELF Honest Data Broker: informatics enabling public–private collaboration in a precompetitive arena. Drug Discov. Today 21, 97–102 (2016).
PubMed Google Scholar
-
Quancard, J. et al. The European Federation for Medicinal Chemistry and Chemical Biology (EFMC) best practice initiative: hit generation. ChemMedChem 18, e202300002 (2023).
CAS PubMed Google Scholar
-
Giannetti, A. M., Koch, B. D. & Browner, M. F. Surface plasmon resonance based assay for the detection and characterization of promiscuous inhibitors. J. Med. Chem. 51, 574–580 (2008).
CAS PubMed Google Scholar
-
Rich, R. L. & Myszka, D. G. Grading the commercial optical biosensor literature — class of 2008: ‘The Mighty Binders’. J. Mol. Recognit. 23, 1–64 (2010).
CAS PubMed Google Scholar
-
Understanding SPR data. Critical Assessment of Computational Hit-Finding Experiments (CACHE) https://cache-challenge.org/sites/default/files/downloadable/forms/understanding_SPR_data.pdf (2024).
-
Wood, R. W. XLII. On a remarkable case of uneven distribution of light in a diffraction grating spectrum. Lond. Edinb. Dubl. Phil. Mag. J. Sci. 4, 396–402 (1902).
Google Scholar
-
Kartal, Ö., Andres, F., Lai, M. P., Nehme, R. & Cottier, K. waveRAPID — a robust assay for high-throughput kinetic screens with the creoptix WAVEsystem. SLAS Discov. 26, 995–1003 (2021).
CAS PubMed Google Scholar
-
Niesen, F. H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat. Protoc. 2, 2212–2221 (2007).
CAS PubMed Google Scholar
-
Sparks, R. P. & Fratti, R. in Methods in Molecular Biology (ed. Fratti, R.) 1860, 191–198 (2019).
-
Langer, A. et al. A new spectral shift-based method to characterize molecular interactions. Assay Drug Dev. Technol. 20, 83–94 (2022).
CAS PubMed PubMed Central Google Scholar
-
Meyer, P. & Saez-Rodriguez, J. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Syst. 12, 636–653 (2021).
CAS PubMed Google Scholar
-
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
CAS PubMed PubMed Central Google Scholar
-
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
CAS PubMed PubMed Central Google Scholar
-
Manoharan, F. Google cloud expands higher education credits to 8 countries in Africa. Google Cloud https://cloud.google.com/blog/topics/public-sector/google-cloud-expands-higher-education-credits-8-countries-africa/ (2022).
-
MAchine learning Innovation Network For Research to Advance MEdicinal chemistry. MAINFRAME https://www.aircheck.ai/mainframe (2025).
-
Bedart, C. et al. The pan-Canadian chemical library: a mechanism to open academic chemistry to high-throughput virtual screening. Sci. Data 11, 597 (2024).
PubMed PubMed Central Google Scholar
-
Burley, S. K. & Berman, H. M. Open-access data: a cornerstone for artificial intelligence approaches to protein structure prediction. Structure 29, 515–520 (2021).
CAS PubMed PubMed Central Google Scholar
-
Edwards, A. Reproducibility: team up with industry. Nature 531, 299–301 (2016).
CAS PubMed Google Scholar
-
Mammoliti, A. et al. Orchestrating and sharing large multimodal data for transparent and reproducible research. Nat. Commun. 12, 5797 (2021).
CAS PubMed PubMed Central Google Scholar
-
Accessibility principles. Web Accessibility Initiative (WAI) https://www.w3.org/WAI/fundamentals/accessibility-principles/ (2024).
Download references
Acknowledgements
The SGC is a registered charity (no. 1097737) that receives funds from Bayer AG, Boehringer Ingelheim, Bristol Myers Squibb, Genentech, Genome Canada through Ontario Genomics Institute (OGI-196), Janssen, Merck KGaA (aka EMD in Canada and USA), Pfizer, Takeda and the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement no. 875510. This work was also funded by the Member States of the European Molecular Biology Laboratory. A.A.A. is an ISCIII–Miguel Servet Fellow supported by the Instituto de Salut Carlos III grant CP23/00115 and by the Spanish Ministry of Science and Innovation (MCIN/AEI) (PID2022- 136344OA-I00); CERCA Program/Generalitat de Catalunya, and FEDER funds/European Regional Development Fund (ERDF) — a way to Build Europe.
Ethics declarations
Competing interests
D.-A.C., K.S. and D.R.O. are shareholders in Pfizer Inc. The Cernak Lab’s research has been supported by MilliporeSigma, Johnson & Johnson, Relay Therapeutics, Merck & Co., Inc., SPT Labtech, National Defense Medical Center, Shanghai University of Traditional Chinese Medicine, Ministry of Education Taiwan, and Entos, Inc. T.C. has consulted for the University of Dundee Drug Discovery Unit, Scorpion Therapeutics, Relay Therapeutics, Amgen, Genentech, Janssen, Pfizer, Vertex, MilliporeSigma, the US Food & Drug Administration, Gilead, AbbVie, Corteva, Syngenta, Firmenich, Biogen, Bayer, UCB Biopharma, National Taiwan University, AstraZeneca, Grunenthal, and Iambic Therapeutics (previously known as Entos, Inc.). He holds equity in Scorpion Therapeutics and is a co-founder and equity holder at Iambic Therapeutics. B.H.-K. is a co-Founder of the MAQC (Massive Analysis and Quality Control) Society and part of the Scientific Advisory Board of: Consortium de recherche biopharmaceutique (CQDM), Quebec, Canada, Break Through Cancer, Commonwealth Cancer Consortium, United States, Canadian Institute of Health Research–Institute of Genetics, Canada, Cancer Grand Challenges, United Kingdom, Shriners Children, United States. He is part of the Executive Committee of the Terry Fox Digital Health and Discovery Platform, Canada and in the Board of Directors of AACR International–Canada, The American Association for Cancer Research, United States. D.W.Y. is co-founder and shareowner of Deliver Therapeutics. I.V.H. is part of the Board of Directors of TenAces Biosciences. A.K. serves on the SAB of Cilcare, Sulfateq BV and Heartbeat.bio. J.C.M. may hold stock options in Astrazeneca. A.M.-F. is the Board Chair of SGC and Conscience. She is also a shareholder for Bayer AG and an external consultant for Nuvisan ICB GmbH. N.B.-B. is on the SAB for Oxford Vacmedix and holds shares of Exact Sciences. A.T. is co-founder of Predictive LLC. A.S.D. holds stocks in DANAHER. F.K. is a shareholder in Evotec SE. A.F.-M. possess Bayer AG shares. A.A.A. is a consultant to Darwin Health and has received grant funding from Vivan Therapeutics and AtG Therapeutics. D.P. holds stock in Novartis.
Peer review
Peer review information
Nature Reviews Chemistry thanks Brian Shoichet, Brent Stockwell and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
About this article
Cite this article
Edwards, A.M., Owen, D.R. & The Structural Genomics Consortium Target 2035 Working Group. Protein–ligand data at scale to support machine learning. Nat Rev Chem (2025). https://doi.org/10.1038/s41570-025-00737-z
Download citation
-
Accepted:
-
Published:
-
DOI: https://doi.org/10.1038/s41570-025-00737-z