Data science reveals universal rules shaping cells’ power stations

Mitochondria are compartments – so-called “organelles” — in our cells that provide the chemical energy supply we need to move, think, and live. Chloroplasts are organelles in plants and algae that capture sunlight and perform photosynthesis. At a first glance, they might look worlds apart. But an international team of researchers, led by the University of Bergen, have used data science and computational biology to show that the same “rules” have shaped how both organelles – and more – have evolved throughout life’s history.

Both types of organelle were once independent organisms, with their own full genomes. Billions of years ago, those organisms were captured and imprisoned by other cells – the ancestors of modern species. Since then, the organelles have lost most of their genomes, with only a handful of genes remaining in modern-day mitochondrial and chloroplast DNA. These remaining genes are essential for life and important in many devastating diseases, but why they stay in organelle DNA – when so many others have been lost — has been debated for decades.

For a fresh perspective on this question, the scientists took a data-driven approach. They gathered data on all the organelle DNA that has been sequenced across life. They then used modelling, biochemistry, and structural biology to represent a wide range of different hypotheses about gene retention as a set of numbers associated with each gene. Using tools from data science and statistics, they asked which ideas could best explain the patterns of retained genes in the data they had compiled – testing the results with unseen data to check their power.

“Some clear patterns emerged from the modelling,” explains Kostas Giannakis, a postdoctoral researcher at Bergen and joint first author on the paper. “Lots of these genes encode subunits of larger cellular machines, which are assembled like a jigsaw. Genes for the pieces in the middle of the jigsaw are most likely to stay in organelle DNA.”

The team believe that this is because keeping local control over the production of such central subunits help the organelle quickly respond to change – a version of the so-called “CoRR” model. They also found support for other existing, debated, and new ideas. For example, if a gene product is hydrophobic – and hard to import to the organelle from outside – the data shows that it is often retained there. Genes that are themselves encoded using stronger-binding chemical groups are also more often retained – perhaps because they are more robust in the harsh environment of the organelle.

“These different hypotheses have usually been thought of as competing in the past,” says Iain Johnston, a professor at Bergen and leader of the team. “But actually no single mechanism can explain all the observations – it takes a combination. A strength of this unbiased, data-driven approach is that it can show that lots of ideas are partly right, but none exclusively so – perhaps explaining the long debate on these topics.”

To their surprise, the team also found that their models trained to describe mitochondrial genes also predicted the retention of chloroplast genes, and vice versa. They also found that the same genetic features shaping mitochondrial and chloroplast DNA also appear to play a role in the evolution of other endosymbionts – organisms which have been more recently captured by other hosts, from algae to insects.

“That was a wow moment,” says Johnston. “We – and others – have had this idea that similar pressures might apply to the evolution of different organelles. But to see this universal, quantitative link – data from one organelle precisely predicting patterns in another, and in more recent endosymbionts – was really striking.”

The research is part of a broader project funded by the European Research Council, and the team are now working on a parallel question – how different organisms maintain the organelle genes that they do retain. Mutations in mitochondrial DNA can cause devastating inherited diseases; the team are using modelling, statistics, and experiments to explore how these mutations are dealt with in humans, plants, and more.