Meta AI researchers release ESM-2 protein language model with 15 billion parameters and ESM Metagenomic Atlas database containing over 600 million predictive structures metagenomic connections.
Proteins are complex molecules containing up to 20 types of amino acids and perform all kinds of biological functions in organisms. They fold into complex three-dimensional structures, the shape of which directly affects their operation.
Determining the type of compound allows scientists to understand how proteins work. Also, form data helps them find ways to imitate, change, or counter this behavior.
You can’t take amino acid formulas and immediately determine the final structure, and simulations and experiments are time consuming.
Meta AI said the ESM-2 Transformer Neural Network is a large language model designed to “explore evolutionary patterns and generate accurate compound predictions directly from a protein sequence.”
The system processes gene sequences using a self-supervised learning method called “masked language modeling“.
According to the scientists, they trained the algorithm on an array of sequences of millions of natural proteins.
“With this approach, the model should correctly complete the words in a passage of text, for example, “To __ or not __, that is __”. We trained a language model to fill in gaps in a protein sequence like “GL_KKE_AHY_G” among millions of different compounds,” the study says.
ESM-2 is the largest and most efficient neural network of its kind. According to scientists, the algorithm is 60 times faster than other modern systems like DeepMind’s AlphaFold.
The algorithm helped create the ESM Metagenomic Atlas by predicting 617 million structures from a protein database MGnify90 in just two weeks on a cluster of 2,000 GPUs. It takes 14.2 seconds to simulate a 384 amino acid compound on a single Nvidia V100 graphics card.
“With today’s computational tools, predicting the structure of hundreds of millions of proteins can take years, even with the resources of a large research institution. To make predictions at the metagenomics scale, a breakthrough in prediction speed is critical,” the developers noted.
Meta AI hopes that ESM-2 and the ESM Metagenomic Atlas will advance science and help professionals who study the history of evolution or fight disease and climate change.
“We are also exploring ways to apply language models to design new proteins and help solve health and environmental problems,” the scientists added.
Recall that in July, DeepMind’s AlphaFold algorithm predicted almost all compounds known to science found in plants, bacteria, and animals.
In the same month, MIT researchers developed the EquiBind deep learning model, which is 1,200 times faster than peers to bind molecules to proteins to create drugs.
In July 2021, artificial intelligence from DeepMind simulated 20,000 human protein structures.
Subscribe to Cryplogger news in Telegram: Cryplogger AI – all the news from the world of AI!
Found a mistake in the text? Select it and press CTRL+ENTER