In 1943, before "artificial intelligence" became a buzzword, two researchers proposed a way to teach computers how to learn based on a simple idea: Living organisms learn constantly, so why not take a cue from biology? That was the origin of the artificial neural network, a machine learning technique loosely inspired by the human brain. It would take seven decades for neural networks to reach maturity, but they now power state-of-the-art AI systems, including chatbots like OpenAI's ChatGPT. They're also making a big impact on the biological sciences — most notably in the study of protein folding, the subject of a recent, remarkable deep dive by my Quanta colleague Yasemin Saplakoglu. From a distance, computer science and biology may seem like an odd match. Traditional computer science deals with abstract mathematical problems and the exact procedures for solving them, called algorithms. Biology, meanwhile, deals with the messy world of life, which often solves problems through seemingly suboptimal evolutionary processes. But messiness is where machine learning methods like artificial neural networks shine. In these networks, data enters an enormous web of interconnected artificial neurons, and the connections between neurons determine the output. In one common application, the inputs represent the pixels of an image, and the output is a label indicating the contents of the image — cat or non-cat, to use a favorite example. How does the machine know how to correctly match inputs to outputs? During an initial training phase, it goes through a large set of labeled data. As it does so, it repeatedly tweaks the connections between neurons to make outputs more closely resemble the correct labels. The system can then identify hidden patterns in the training data that humans might miss, and it can correctly label data it's never seen. In other words, it learns. The details of how it learns are often murky, but with enough training data, the end result can be uncannily accurate. In many areas of biology, data is abundant, and simple explanations are in short supply. That's why neural networks and their ability to glean hidden patterns can help solve complex problems in biology. What's New and Noteworthy Labeling images is one of the most common applications for neural networks. That makes them a natural fit for fields of biology where the data is inherently visual, like cell microscopy. In a Q&A published in 2021, the computational biologist Anne Carpenter spoke to Quanta about how new techniques from machine learning can improve software for analyzing cell images. Neural networks can even help researchers visualize the insides of living cells, as the computer vision researcher Greg Johnson explained in a 2019 Q&A. These AI tools don't make more traditional biology techniques obsolete. "What we're really trying to do is predict the outcomes of experiments, so scientists can prioritize the experiments that they think are interesting," Johnson said. One of the strengths of neural networks is their versatility. They can take in different kinds of input data beyond images: Networks trained on large data sets of well-understood chemical compounds have helped researchers understand our sense of smell and discover new antibiotics. When researchers search for new drugs, analyzing the vast number of possibly interesting chemicals is out of the question — they have to decide in advance which possibilities to explore. Machine learning tools can point researchers toward hypotheses they wouldn't have considered in the first place. But it's the study of protein folding that best exemplifies the impact of machine learning in biology. Hundreds of millions of proteins perform countless vital tasks in organisms, and they fold into a dizzying variety of different shapes that determine how they function. Researchers have long sought to predict how proteins will fold based on their amino acid sequences, but this prediction task is notoriously difficult. Many decades of research into the atomic-level interactions between amino acid molecules yielded little insight into how to predict protein structure — until scientists got machine learning involved. In 2020, Google DeepMind released a neural network named AlphaFold2, which could predict protein folding far more accurately than competing techniques. It's important to remember that the AlphaFold2 breakthrough didn't come out of nowhere. It was possible only because scientists have amassed large high-quality data sets through decades of painstaking experiments. The protein folding problem was "almost a perfect example for an AI solution," as the veteran structural biologist Janet Thornton put it. Yasemin's new extended feature tells the story of the protein folding problem from its roots in the 1950s to the new age of AI ushered in by AlphaFold2. It's a fascinating story, and essential reading for understanding both the promise and the limitations of machine learning in biology. |