DeepMind, Major Scientific Advance: DeepMind AI AlphaFold Solves 50-Year-Old Grand Challenge of Protein Structure Prediction, DECEMBER 1, 2020.
CASP uses the “Global Distance Test (GDT)” metric to assess accuracy, ranging from 0-100. The new AlphaFold system achieves a median score of 92.4 GDT overall across all targets. The system’s average error is approximately 1.6 Angstroms — about the width of an atom. According to Professor John Moult, Co-founder and Chair of CASP, a score of around 90 GDT is informally considered to be competitive with results obtained from experimental methods....
Why protein structure prediction matters
Proteins are essential to life and their shapes are closely linked with their functions. The ability to predict protein structures accurately enables a better understanding of what they do and how they work. There are currently over 200 million proteins in the main database and only a fraction of their 3D structures have been mapped out.
A major challenge is the astronomical number of ways a protein could theoretically fold before settling into its final 3D structure. Many of the greatest challenges facing society, like developing treatments for diseases or finding enzymes that break down industrial waste, are fundamentally tied to proteins and the role they play. Determining protein shapes and functions is a major field of scientific research, primarily using experimental techniques that can take years of painstaking and laborious work per structure, and require the use of multi-million dollar specialised equipment.
DeepMind’s approach to the protein folding problem
This breakthrough builds on DeepMind’s first entry at CASP13 in 2018, where the initial version of AlphaFold achieved the highest level of accuracy among all participants. Now, DeepMind has developed new deep learning architectures for CASP14, drawing inspiration from the fields of biology, physics, and machine learning, as well as the work of many scientists in the protein folding field over the past half-century.
A folded protein can be thought of as a “spatial graph”, where residues are the nodes and edges connect the residues in close proximity. This graph is important for understanding the physical interactions within proteins, as well as their evolutionary history. For the latest version of AlphaFold used at CASP14, DeepMind created an attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph, while reasoning over the implicit graph that it’s building. It uses evolutionarily related sequences, multiple sequence alignment (MSA), and a representation of amino acid residue pairs to refine this graph.
By iterating this process, the system develops strong predictions of the underlying physical structure of the protein. Additionally, AlphaFold can predict which parts of each predicted protein structure are reliable using an internal confidence measure.
The system was trained on publicly available data consisting of ~170,000 protein structures from the protein data bank, using a relatively modest amount of compute by modern machine learning standards — approximately 128 TPUv3-cores (roughly equivalent to ~100-200 GPUs) run over a few weeks.
沒有留言:
張貼留言