skip to Main Content
Protein–ligand-data-at-scale-to-support-machine-learning-|-nature-reviews-chemistry

Protein–ligand data at scale to support machine learning | Nature Reviews Chemistry

References

  1. Edwards, A. M. et al. Too many roads not taken. Nature 470, 163–165 (2011).

    CAS  PubMed  Google Scholar 

  2. Moustakim, M. et al. Target identification using chemical probes. Methods Enzymol. 610, 27–58 (2018).

    CAS  PubMed  Google Scholar 

  3. Bond, M. J. & Crews, C. M. Proteolysis targeting chimeras (PROTACs) come of age: entering the third decade of targeted protein degradation. RSC Chem. Biol. 2, 725–742 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Kanev, G. K., de Graaf, C., Westerman, B. A., de Esch, I. J. P. & Kooistra, A. J. KLIFS: an overhaul after the first 5 years of supporting kinase research. Nucleic Acids Res. 49, D562–D569 (2021).

    CAS  PubMed  Google Scholar 

  5. Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16, 4799–4832 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Petrović, D. et al. Virtual screening in the cloud identifies potent and selective ROS1 kinase inhibitors. J. Chem. Inf. Model. 62, 3832–3843 (2022).

    PubMed  Google Scholar 

  7. Alon, A. et al. Structures of the σ2 receptor enable docking for bioactive ligand discovery. Nature 600, 759–764 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Stein, R. M. et al. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature 579, 609–614 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Ren, F. et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat. Biotechnol. 43, 63–75 (2025).

    CAS  PubMed  Google Scholar 

  10. Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).

    CAS  PubMed  Google Scholar 

  12. Zhu, T. et al. Hit identification and optimization in virtual screening: practical recommendations based on a critical literature analysis. J. Med. Chem. 56, 6560–6572 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Schneider, G. Virtual screening: an endless staircase? Nat. Rev. Drug Discov. 9, 273–276 (2010).

    CAS  PubMed  Google Scholar 

  14. Carter, A. J. et al. Target 2035: probing the human proteome. Drug Discov. Today 24, 2111–2115 (2019).

    CAS  PubMed  Google Scholar 

  15. Ackloo, S. et al. CACHE (Critical assessment of computational hit-finding experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6, 287–295 (2022).

    PubMed  PubMed Central  Google Scholar 

  16. For chemists, the AI revolution has yet to happen. Nature 617, 438 (2023).

  17. Mock, M., Edavettal, S., Langmead, C. & Russell, A. AI can help to speed up drug discovery — but only if we give it the right data. Nature 621, 467–470 (2023).

    CAS  PubMed  Google Scholar 

  18. Martin, E. J. et al. All-assay-max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 novartis assays. J. Chem. Inf. Model. 59, 4450–4459 (2019).

    CAS  PubMed  Google Scholar 

  19. Landrum, G. A. & Riniker, S. Combining IC50 or Ki values from different sources is a source of significant noise. J. Chem. Inf. Model. 64, 1560–1567 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Martin, E. J. & Zhu, X. W. Collaborative profile-QSAR: a natural platform for building collaborative models among competing companies. J. Chem. Inf. Model. 61, 1603–1616 (2021).

    CAS  PubMed  Google Scholar 

  21. Zardecki, C., Dutta, S., Goodsell, D. S., Voigt, M. & Burley, S. K. RCSB Protein Data Bank: a resource for chemical, biochemical, and structural explorations of large and small biomolecules. J. Chem. Educ. 93, 569–575 (2016).

    CAS  Google Scholar 

  22. Moult, J., Pedersen, J. T., Judson, R. & Fidelis, K. A large‐scale experiment to assess protein structure prediction methods. Proteins 23, ii–v (1995).

    CAS  PubMed  Google Scholar 

  23. Edfeldt, K. et al. A data science roadmap for open science organizations engaged in early-stage drug discovery. Nat. Commun. 15, 5640 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Thorne, N., Auld, D. S. & Inglese, J. Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Curr. Opin. Chem. Biol. 14, 315–324 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Clark, M. A. et al. Design, synthesis and selection of DNA-encoded small-molecule libraries. Nat. Chem. Biol. 5, 647–654 (2009).

    CAS  PubMed  Google Scholar 

  26. McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63, 8857–8866 (2020).

    CAS  PubMed  Google Scholar 

  27. Li, A. S. M. et al. Discovery of nanomolar DCAF1 small molecule ligands. J. Med. Chem. 66, 5041–5060 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Ahmad, S. et al. Discovery of a first-in-class small-molecule ligand for WDR91 using DNA-encoded chemical library selection followed by machine learning. J. Med. Chem. 66, 16051–16061 (2023).

    CAS  PubMed  Google Scholar 

  29. Kelly, M. A., McLellan, T. J. & Rosner, P. J. Strategic use of affinity-based mass spectrometry techniques in the drug discovery process. Anal. Chem. 74, 1–9 (2002).

    CAS  PubMed  Google Scholar 

  30. Prudent, R., Annis, D. A., Dandliker, P. J., Ortholand, J. Y. & Roche, D. Exploring new targets and chemical space with affinity selection-mass spectrometry. Nat. Rev. Chem. 5, 62–71 (2021).

    CAS  PubMed  Google Scholar 

  31. Gesmundo, N. J. et al. Nanoscale synthesis and affinity ranking. Nature 557, 228–232 (2018).

    CAS  PubMed  Google Scholar 

  32. L’Heureux, A., Grolinger, K., Elyamany, H. F. & Capretz, M. A. M. Machine learning with big data: challenges and approaches. IEEE Access. 5, 7776–7797 (2017).

    Google Scholar 

  33. Najafabadi, M. M. et al. Deep learning applications and challenges in big data analytics. J. Big Data 2, 1 (2015).

    Google Scholar 

  34. Lo, Y. C., Rensi, S. E., Torng, W. & Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug Discov. Today 23, 15–38-1546 (2018).

    Google Scholar 

  35. Brenner, S. & Lerner, R. A. Encoded combinatorial chemistry. Proc. Natl Acad. Sci. USA 89, 5381–5383 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Melkko, S., Dumelin, C. E., Scheuermann, J. & Neri, D. Lead discovery by DNA-encoded chemical libraries. Drug Discov. Today 12, 456–471 (2007).

    Google Scholar 

  37. Gironda-Martínez, A., Donckele, E. J., Samain, F. & Neri, D. DNA-encoded chemical libraries: a comprehensive review with succesful stories and future challenges. ACS Pharmacol. Transl. Sci. 4, 1265–1279 (2021).

    PubMed  PubMed Central  Google Scholar 

  38. Peterson, A. A. & Liu, D. R. Small-molecule discovery through DNA-encoded libraries. Nat. Rev. Drug Discov. 22, 699–722 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Lim, K. S. et al. Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. J. Chem. Inf. Model. 62, 2316–2331 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Ackloo, S. et al. A target class ligandability evaluation of WD40 repeat-containing proteins. J. Med. Chem. 68, 1092–1112 (2024).

    PubMed  PubMed Central  Google Scholar 

  42. Han, S. et al. Highly selective novel heme oxygenase-1 hits found by DNA-encoded library machine learning beyond the DEL chemical space. ACS Med. Chem. Lett. 15, 1456–1466 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. SGC and HitGen announce research collaboration focused on DNA-encoded library based drug discovery. HitGen https://www.hitgen.com/en/news-details-319.html (2023).

  44. X-chem and structural genomics consortium enter into collaboration to unlock the human proteome and promote open science. X-Chem https://www.x-chemrx.com/about/news/x-chem-and-structural-genomics-consortium-enter-into-collaboration-to-unlock-the-human-proteome-and-promote-open-science/ (2023).

  45. Wellnitz, J. et al. Enabling open machine learning of DNA encoded library selections to accelerate the discovery of small molecule protein binders. Preprint at https://doi.org/10.26434/chemrxiv-2024-xd385 (2024).

  46. Prudent, R., Lemoine, H., Walsh, J. & Roche, D. Affinity selection mass spectrometry speeding drug discovery. Drug Discov. Today 28, 103760 (2023).

    CAS  PubMed  Google Scholar 

  47. Xin, Y. et al. Affinity selection of double-click triazole libraries for rapid discovery of allosteric modulators for GLP-1 receptor. Proc. Natl Acad. Sci. USA 120, e2220767120 (2023).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Liu, J. et al. The omega-3 hydroxy fatty acid 7(S)-HDHA is a high-affinity PPARα ligand that regulates brain neuronal morphology. Sci. Signal. 15, eabo1857 (2022).

    CAS  PubMed  Google Scholar 

  49. Zhang, P. et al. Development of an α-klotho recognizing high-affinity peptide probe from in-solution enrichment. JACS Au 4, 1334–1344 (2024).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Muchiri, R. N. & van Breemen, R. B. Affinity selection–mass spectrometry for the discovery of pharmacologically active compounds from combinatorial libraries and natural products. J. Mass Spectrom. 56, e4647 (2021).

    CAS  PubMed  Google Scholar 

  51. Wang, X. et al. Enantioselective protein affinity selection mass spectrometry (EAS-MS). Preprint at https://doi.org/10.1101/2025.01.17.633682 (2025).

  52. Paillard, G. et al. The ELF Honest Data Broker: informatics enabling public–private collaboration in a precompetitive arena. Drug Discov. Today 21, 97–102 (2016).

    PubMed  Google Scholar 

  53. Quancard, J. et al. The European Federation for Medicinal Chemistry and Chemical Biology (EFMC) best practice initiative: hit generation. ChemMedChem 18, e202300002 (2023).

    CAS  PubMed  Google Scholar 

  54. Giannetti, A. M., Koch, B. D. & Browner, M. F. Surface plasmon resonance based assay for the detection and characterization of promiscuous inhibitors. J. Med. Chem. 51, 574–580 (2008).

    CAS  PubMed  Google Scholar 

  55. Rich, R. L. & Myszka, D. G. Grading the commercial optical biosensor literature — class of 2008: ‘The Mighty Binders’. J. Mol. Recognit. 23, 1–64 (2010).

    CAS  PubMed  Google Scholar 

  56. Understanding SPR data. Critical Assessment of Computational Hit-Finding Experiments (CACHE) https://cache-challenge.org/sites/default/files/downloadable/forms/understanding_SPR_data.pdf (2024).

  57. Wood, R. W. XLII. On a remarkable case of uneven distribution of light in a diffraction grating spectrum. Lond. Edinb. Dubl. Phil. Mag. J. Sci. 4, 396–402 (1902).

    Google Scholar 

  58. Kartal, Ö., Andres, F., Lai, M. P., Nehme, R. & Cottier, K. waveRAPID — a robust assay for high-throughput kinetic screens with the creoptix WAVEsystem. SLAS Discov. 26, 995–1003 (2021).

    CAS  PubMed  Google Scholar 

  59. Niesen, F. H., Berglund, H. & Vedadi, M. The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat. Protoc. 2, 2212–2221 (2007).

    CAS  PubMed  Google Scholar 

  60. Sparks, R. P. & Fratti, R. in Methods in Molecular Biology (ed. Fratti, R.) 1860, 191–198 (2019).

  61. Langer, A. et al. A new spectral shift-based method to characterize molecular interactions. Assay Drug Dev. Technol. 20, 83–94 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Meyer, P. & Saez-Rodriguez, J. Advances in systems biology modeling: 10 years of crowdsourcing DREAM challenges. Cell Syst. 12, 636–653 (2021).

    CAS  PubMed  Google Scholar 

  63. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Manoharan, F. Google cloud expands higher education credits to 8 countries in Africa. Google Cloud https://cloud.google.com/blog/topics/public-sector/google-cloud-expands-higher-education-credits-8-countries-africa/ (2022).

  66. MAchine learning Innovation Network For Research to Advance MEdicinal chemistry. MAINFRAME https://www.aircheck.ai/mainframe (2025).

  67. Bedart, C. et al. The pan-Canadian chemical library: a mechanism to open academic chemistry to high-throughput virtual screening. Sci. Data 11, 597 (2024).

    PubMed  PubMed Central  Google Scholar 

  68. Burley, S. K. & Berman, H. M. Open-access data: a cornerstone for artificial intelligence approaches to protein structure prediction. Structure 29, 515–520 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. Edwards, A. Reproducibility: team up with industry. Nature 531, 299–301 (2016).

    CAS  PubMed  Google Scholar 

  70. Mammoliti, A. et al. Orchestrating and sharing large multimodal data for transparent and reproducible research. Nat. Commun. 12, 5797 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Accessibility principles. Web Accessibility Initiative (WAI) https://www.w3.org/WAI/fundamentals/accessibility-principles/ (2024).

Download references

Acknowledgements

The SGC is a registered charity (no. 1097737) that receives funds from Bayer AG, Boehringer Ingelheim, Bristol Myers Squibb, Genentech, Genome Canada through Ontario Genomics Institute (OGI-196), Janssen, Merck KGaA (aka EMD in Canada and USA), Pfizer, Takeda and the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement no. 875510. This work was also funded by the Member States of the European Molecular Biology Laboratory. A.A.A. is an ISCIII–Miguel Servet Fellow supported by the Instituto de Salut Carlos III grant CP23/00115 and by the Spanish Ministry of Science and Innovation (MCIN/AEI) (PID2022- 136344OA-I00); CERCA Program/Generalitat de Catalunya, and FEDER funds/European Regional Development Fund (ERDF) — a way to Build Europe.

Author information

Authors and Affiliations

  1. Structural Genomics Consortium, University of Toronto and University Health Network, Toronto, Ontario, Canada

    Aled M. Edwards, Matthieu Schapira, Santha Santhakumar, Hui Peng, Maxwell R. Morgan, Sofia Melliou, Rachel J. Harding, Levon Halabelian, Benjamin Haibe-Kains, Claudia Gordijo, Madison M. Edwards, Dalia Barsyte-Lovejoy, Cheryl Arrowsmith & Suzanne Ackloo

  2. Pfizer Research and Development, Cambridge, MA, USA

    Dafydd R. Owen

  3. IBM Accelerated Discovery Research, Yorktown Heights, NY, USA

    Leili Zhang & Wendy D. Cornell

  4. Collaborative Drug Discovery (CDD) Baylor College of Medicine One Baylor Plaza, Houston, TX, USA

    Damian W. Young

  5. Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

    Timothy M. Willson, James Wellnitz, Alexander Tropsha, David Drewry, Rafael Counago, Peter J. Brown, Frances M. Bashore & Alison D. Axtman

  6. Division of Chemical Biology and Medicinal Chemistry UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

    James Wellnitz & Alexander Tropsha

  7. National Institutes of Health, Bethesda, MD, USA

    Yanli Wang

  8. Hit Discovery, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK

    Jarrod Walsh & Emma Rivers

  9. Research & Early Development, Novo Nordisk A/S, Måløv, Denmark

    Erik Vernet

  10. Institute of Pharmaceutical Chemistry, Johann Wolfgang Goethe University, Frankfurt, Germany

    Claudia Tredup, Amelia Tjaden, Susanne Müller-Knapp, Stefan Knapp & Thomas Hanke

  11. Buchmann Institute for Molecular Life Sciences and Structural Genomics Consortium (SGC), Frankfurt, Germany

    Claudia Tredup, Amelia Tjaden, Susanne Müller-Knapp, Stefan Knapp & Thomas Hanke

  12. Structural Genomics Consortium, School of Pharmacy, University College London, London, UK

    Matthew H. Todd, Snezana Djordjevic & Nicola A. Burgess-Brown

  13. Discovery Research, Boehringer Ingelheim Pharma GmbH & Co. KG, Biberach, Germany

    Sven Thamm, Florian Montel, Uta Lessel & Amaury Fernández-Montalván

  14. Structural Genomics Consortium, Department of Medicine, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden

    Michael Sundström & Opher Gileadi

  15. Machine Learning and Computational Sciences, Pfizer Research and Development, Berlin, Germany

    Andreas Steffen & Djork-Arné Clevert

  16. Center for Therapeutics Discovery, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA

    Shaun Stauffer & Jesse A. Coker

  17. Center for Molecular Biology and Genetic Engineering (CBMEG), Universidade Estadual de Campinas (UNICAMP), Campinas/SP. Center for Medicinal Chemistry (CQMED), Universidade Estadual de Campinas (UNICAMP), Campinas/SP, Brazil

    Lucas Rodrigo de Souza & Mario H. Bengtson

  18. National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, MD, USA

    Min Shen

  19. Pfizer Research and Development, Machine Learning and Computational Sciences, Berlin, Germany

    Kristof Schütt

  20. Protein Science, Structural Biology and Biophysics, Discovery Sciences, Research and Development, AstraZeneca, Gothenburg, Sweden

    Lovisa Holmberg Schiavone

  21. Takeda Pharmaceuticals, San Diego, CA, USA

    Kumar Saikatendu

  22. Digital Life Sciences, Nuvisan ICB GmbH, Berlin, Germany

    Dušan Petrović

  23. Research & Development, Pharmaceuticals, Bayer AG, Monheim, Germany

    John P. O’Donnell

  24. Nuvisan Innovation Campus Berlin GmbH, Berlin, Germany

    Anke Mueller-Fahrnow

  25. Protein, Structure and Biophysics, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK

    Juan Carlos Mobarec

  26. Science for Life Laboratory, Department of Oncology and Pathology, Karolinska Institute, Stockholm, Sweden

    Maurice Michel

  27. Center for Molecular Medicine, Karolinska Institute and Karolinska Hospital, Stockholm, Sweden

    Maurice Michel

  28. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK

    Andrew R. Leach

  29. Discovery Research, Boehringer Ingelheim International GmbH, Ingelheim, Germany

    Oliver Krämer

  30. Evotec SE, Hamburg, Germany

    Florian Krieger

  31. X-Chem Inc., Waltham, MA, USA

    Anthony Keefe, Marie-Aude Guié & Arrash J. Baghaie

  32. Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Frankfurt, Germany

    Aimo Kannt

  33. Bristol Myers Squibb, San Diego, CA, USA

    Scott A. Johnson

  34. Structural Genomics Consortium Frankfurt, Goethe University Frankfurt, Frankfurt, Germany

    Sandra Häberle

  35. Bristol Myers Squibb, Cambridge, MA, USA

    Emily Rose Holzinger

  36. Discovery and Development Technologies, Merck KGaA, Darmstadt, Germany

    Ingo V. Hartung

  37. Bayer AG, Drug Discovery Sciences, Berlin, Germany

    Judith Günther

  38. Sage Bionetworks, Seattle, WA, USA

    Luca Foschini

  39. Discovery Sciences, R&D, AstraZeneca, Gothenburg, Sweden

    Ola Engkvist

  40. Department of Computer Science and Engineering, Chalmers University of Technology and University of Gothenburg, Gothenburg, Sweden

    Ola Engkvist

  41. OMass Therapeutics Ltd, ARC, Oxford, UK

    Katharina Duerr

  42. HitGen Inc., Chengdu, China

    Dengfeng Dou

  43. Abcam, Biomedical Campus, Cambridge, UK

    Alejandra Solache Diaz

  44. Data Sciences and Quantitative Biology, Discovery Sciences, R&D BioPharmaceuticals, AstraZeneca, Cambridge, UK

    Sergio Martinez Cuesta

  45. Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA

    Timothy Cernak

  46. proCURE Department, Oncobell Program, Catalan Institute of Oncology (ICO) and Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran y Reynals, L’Hospitalet del Llobregat, Barcelona, Spain

    Albert A. Antolin

Authors

  1. Aled M. Edwards
  2. Dafydd R. Owen

Consortia

The Structural Genomics Consortium Target 2035 Working Group

  • Aled M. Edwards
  • , Dafydd R. Owen
  • , Leili Zhang
  • , Damian W. Young
  • , Timothy M. Willson
  • , James Wellnitz
  • , Yanli Wang
  • , Jarrod Walsh
  • , Erik Vernet
  • , Alexander Tropsha
  • , Claudia Tredup
  • , Matthew H. Todd
  • , Amelia Tjaden
  • , Sven Thamm
  • , Michael Sundström
  • , Andreas Steffen
  • , Shaun Stauffer
  • , Lucas Rodrigo de Souza
  • , Min Shen
  • , Kristof Schütt
  • , Lovisa Holmberg Schiavone
  • , Matthieu Schapira
  • , Santha Santhakumar
  • , Kumar Saikatendu
  • , Emma Rivers
  • , Dušan Petrović
  • , Hui Peng
  • , John P. O’Donnell
  • , Susanne Müller-Knapp
  • , Anke Mueller-Fahrnow
  • , Maxwell R. Morgan
  • , Florian Montel
  • , Juan Carlos Mobarec
  • , Maurice Michel
  • , Sofia Melliou
  • , Uta Lessel
  • , Andrew R. Leach
  • , Oliver Krämer
  • , Florian Krieger
  • , Stefan Knapp
  • , Anthony Keefe
  • , Aimo Kannt
  • , Scott A. Johnson
  • , Sandra Häberle
  • , Emily Rose Holzinger
  • , Ingo V. Hartung
  • , Rachel J. Harding
  • , Thomas Hanke
  • , Levon Halabelian
  • , Benjamin Haibe-Kains
  • , Judith Günther
  • , Marie-Aude Guié
  • , Claudia Gordijo
  • , Opher Gileadi
  • , Luca Foschini
  • , Amaury Fernández-Montalván
  • , Ola Engkvist
  • , Madison M. Edwards
  • , Katharina Duerr
  • , David Drewry
  • , Dengfeng Dou
  • , Snezana Djordjevic
  • , Alejandra Solache Diaz
  • , Sergio Martinez Cuesta
  • , Rafael Counago
  • , Wendy D. Cornell
  • , Jesse A. Coker
  • , Djork-Arné Clevert
  • , Timothy Cernak
  • , Nicola A. Burgess-Brown
  • , Peter J. Brown
  • , Mario H. Bengtson
  • , Frances M. Bashore
  • , Dalia Barsyte-Lovejoy
  • , Arrash J. Baghaie
  • , Alison D. Axtman
  • , Cheryl Arrowsmith
  • , Albert A. Antolin
  •  & Suzanne Ackloo

Corresponding authors

Correspondence to Aled M. Edwards or Dafydd R. Owen.

Ethics declarations

Competing interests

D.-A.C., K.S. and D.R.O. are shareholders in Pfizer Inc. The Cernak Lab’s research has been supported by MilliporeSigma, Johnson & Johnson, Relay Therapeutics, Merck & Co., Inc., SPT Labtech, National Defense Medical Center, Shanghai University of Traditional Chinese Medicine, Ministry of Education Taiwan, and Entos, Inc. T.C. has consulted for the University of Dundee Drug Discovery Unit, Scorpion Therapeutics, Relay Therapeutics, Amgen, Genentech, Janssen, Pfizer, Vertex, MilliporeSigma, the US Food & Drug Administration, Gilead, AbbVie, Corteva, Syngenta, Firmenich, Biogen, Bayer, UCB Biopharma, National Taiwan University, AstraZeneca, Grunenthal, and Iambic Therapeutics (previously known as Entos, Inc.). He holds equity in Scorpion Therapeutics and is a co-founder and equity holder at Iambic Therapeutics. B.H.-K. is a co-Founder of the MAQC (Massive Analysis and Quality Control) Society and part of the Scientific Advisory Board of: Consortium de recherche biopharmaceutique (CQDM), Quebec, Canada, Break Through Cancer, Commonwealth Cancer Consortium, United States, Canadian Institute of Health Research–Institute of Genetics, Canada, Cancer Grand Challenges, United Kingdom, Shriners Children, United States. He is part of the Executive Committee of the Terry Fox Digital Health and Discovery Platform, Canada and in the Board of Directors of AACR International–Canada, The American Association for Cancer Research, United States. D.W.Y. is co-founder and shareowner of Deliver Therapeutics. I.V.H. is part of the Board of Directors of TenAces Biosciences. A.K. serves on the SAB of Cilcare, Sulfateq BV and Heartbeat.bio. J.C.M. may hold stock options in Astrazeneca. A.M.-F. is the Board Chair of SGC and Conscience. She is also a shareholder for Bayer AG and an external consultant for Nuvisan ICB GmbH. N.B.-B. is on the SAB for Oxford Vacmedix and holds shares of Exact Sciences. A.T. is co-founder of Predictive LLC. A.S.D. holds stocks in DANAHER. F.K. is a shareholder in Evotec SE. A.F.-M. possess Bayer AG shares. A.A.A. is a consultant to Darwin Health and has received grant funding from Vivan Therapeutics and AtG Therapeutics. D.P. holds stock in Novartis.

Peer review

Peer review information

Nature Reviews Chemistry thanks Brian Shoichet, Brent Stockwell and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Edwards, A.M., Owen, D.R. & The Structural Genomics Consortium Target 2035 Working Group. Protein–ligand data at scale to support machine learning. Nat Rev Chem (2025). https://doi.org/10.1038/s41570-025-00737-z

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41570-025-00737-z

Back To Top