How to map 37 trillion cells. Founded and led by two pioneering female scientists, the Human Cell Atlass marries technical innovation with open science and collaborative spirit..
Google Maps has become so quotidian that it is easy to forget what an audacious idea it once was. This software allows you to view maps of the Earth at whatever scale you want. Wouldn’t it be fabulous if a similar app existed for the human body – and instead of moving between buildings, you could explore the 37 trillion cells that form a human being?
Creating such a map is the ambition of Aviv Regev and Sarah Teichmann. Regev, now at Genentech, and BIF alumna Teichmann of the Wellcome Sanger Institute in Cambridge are the two computational biologists leading the Human Cell Atlas project (HCA). The Google Maps analogy was so irresistible that they could not help using it in October 2016, when they first outlined their goals at a meeting in London.
That big thinking had been sparked by the new ways in which cells could be defined using single-cell genomics and transcriptomics.
Methods for analysing the DNA or RNA of individual cells – the basis of the “resolution revolution” in genomics – had been developing in the late 2000s and early 2010s. The first genomics paper comprehensively cataloguing messenger RNA (whole transcriptomes) from single cells was published in 2009. That paper, however, characterized only a few cells. Both Teichmann and Regev started using these techniques soon after they were first developed, and saw that if they could be scaled, they could be used to create “a unified atlas of the cells of the human body”
In 2014, Regev, then at the Broad Institute of MIT and Harvard, presented the atlas idea to the National Human Genome Research Institute (NHGRI), part of the American National Institutes of Health (NIH). At the same time, in the UK, Teichmann was coleading the Sanger-EBI Single-Cell Genomics Centre, laying foundations to scale up the technologies and enable the possibility of mapping the human body.
In early 2016, the two of them joined forces to rally the international community around the vision of a human cell atlas.
Teichmann says the ultimate goal is to create reference atlases of healthy tissues, including developmental and male and female reproductive tissues. Using them, she explains, will be like zooming from a whole body or tissues into individual cells – or like moving from a Google map to a street view.
To achieve this, Regev and Teichmann envisaged a global community that would openly share expertise, resources, and data. At the London meeting in 2016, they proposed this and the scientific community immediately bought into it.
Five years on, the HCA is a well-funded collaboration expanding in multiple directions, involving around 2,000 researchers from almost 80 countries (See box above: “A Collaboration of Cartographers”). And with the transcriptomes of over 39 million cells from fifteen different organs sequenced, the consortium is well en route to version 1.0 of the atlas.
FROM CELLS TO ATLASES
In the current era of big science, there is something immediately appealing about a massive reference data set describing the human body’s every cell type. Cells are the basic unit of life – and understanding how individual cells together form the functioning body of a multicellular organism requires knowledge of the diverse cells the body contains. Yet Teichmann says such an atlas has not traditionally occupied the imaginations of cell biologists.
She attributes this to the dominance of microscopy in cell biology. Ever since 1665 – when Robert Hooke first saw cells in a slither of cork using a new type of microscope – biologists have primarily described cell types according to their appearance. But microscopy-based biology has not lent itself to high-throughput science. “The HCA”, Teichmann says, “is the marriage of the bioinformatics and genomics high-throughput community with cell biology studies, clinical research, and that imaging element.”
This marriage is described in the consortium’s white paper, which details two main complementary branches of investigation. The first entails using single-cell omics to catalogue all the body’s cell types, tissue by tissue. To date, these efforts have been dominated by transcriptomics, but increasingly these characterizations will be supplemented by data from single-cell analyses of epigenomics, chromatin structure, and protein levels. The second branch involves spatial technologies to describe where the various cell types reside.
Highlighting the departure from traditional microscopy-based cell classification, mass single-cell transcriptomic profiling operates initially without reference to cellular morphology. Tissue samples are not looked at intact, but lysed to yield a cell suspension. The individual cells’ mRNA content is then sequenced, and each cell is represented by its expression levels of the roughly 20,000 human coding genes.
A cell is then plotted as a position in 20,000-dimensional space. Such space exists only mathematically. To help imagine it, picture having three cells and mRNA levels for three genes. Cell A expresses the genes at 4, 5, and 12. Cell B has values of 1, 1, and 10. And Cell C: 4, 5, 1. You could then plot these cells as x, y, and z coordinates on a 3D graph. Note that cells A and C would have been indistinguishable with data only from genes one and two – but the third gene makes them clearly distinct points in 3D space.
Allowing for natural variability in expression levels, plotting data from many cells creates clusters in multidimensional space. Each cluster represents a cell type – with the size of each cluster corresponding to the abundance of that cell type. Plotting all the cells sampled from an organ or a body – i.e. millions to trillions of cells, represented on 20,000 axes – should reveal all the various cell types present.
When complete, this cellular catalogue – incorporating data from all the body’s tissues – will be the endpoint of branch one. Then, branch two’s spatial technologies will yield the final reference bodies, indicating the distributions of the characterized cell types across the body.
Emma Damm, a graduate student working with Teichmann, says these spatial methods, which are rapidly evolving, can currently be divided into two main categories. One uses fluorescent probes that bind to RNA within cells in slices of tissue. Essentially an advanced form of in situ hybridization, the approach has been honed so that multiple probes, labelled with different fluorophores, can distinguish hundreds of mRNAs: a tissue section glowing with different combinations of colours showing those RNAs’ distribution.
To identify as many cell types as possible in a given tissue section, researchers analyse the expression profiles of the cell types present in the corresponding organ and design probes accordingly. “An area of development”, Damm says, “is trying to identify the best set of marker genes to allow you to identify all the cell types in the tissue.”
The second spatial technology uses sequencing and computation. Here, tissue slices are placed on grids of 50–100 μm diameter wells, and the cells situated above a well are then pulled down into the wells. Depending on cell size, each well in the grid will contain roughly 5–20 cells. Each well’s RNA is barcoded to indicate where in the tissue it came from, then sequenced. Finally, machine learning algorithms analyse the wells’ RNA content and compare each one to the profiles of the cell types defined in branch one. The algorithms then estimate the cell types present in every 50–100 μm spot of tissue, thereby creating a map of cell types.
These techniques, Regev says, “are essential to understand how cells work together in their environments, and to build comprehensive maps of the cells in the human body.”
The HCA’s first draft of the atlas will not be a whole-body map but rather cell type catalogues of twelve or more major organs containing some spatial characterization. It is expected within a year or two.
BIOLOGICAL INSIGHTS
So far, members of the HCA community have published more than 65 papers, which have included examinations of the lungs, colon, brain, liver, retina, pancreas, and heart. As was widely anticipated, single-cell transcriptomics has revealed a number of previously unknown cell types. Sometimes these were entirely novel; other times, well-known cell types were shown to actually comprise two or more subtypes.
The discovery of completely novel cell types is an exciting part of HCA research. As an example, Regev cites ionocytes in the lungs – these newly discovered, very rare cells may be important for cystic fibrosis biology, as they express the CTFR gene, mutations in which cause the condition.
New subclasses of neurons and dendritic cells were discovered in the retina and immune system, respectively. In both cases, these were unsuspected subdivisions of very well-studied cell types, which morphological studies had never previously indicated.
Among Teichmann’s favourite examples of novel functional insights are those arising from her group’s study of the first trimester of the maternal–foetal interface. “It wasn’t really understood how maternal immune tolerance towards paternal antigens occurred,” she says. “Our work really turned a light onto that topic.” They showed, among other things, that the maternal side of the placenta, the decidua, contains layers of stromal cells which make unexpected contributions to immune system regulation.
“We also generated a general statistical framework to map the intercellular conversations that are happening in any tissue,” she says of this placenta study. The group developed a computational tool – publicly available at www.cellphonedb.org – that predicts how ligands from one cell type activate receptors from other cell types and thus change their transcriptome. That is, cell-specific gene expression data can decipher important cell-to-cell communications.
“Computation is really at the heart of this project,” Teichmann says. Innovative statistical analyses of big data impact every aspect of it. And this includes the challenge of distinguishing different cell types from distinct cell states.
Despite a colloquial understanding of the concept, precisely defining what “cell type” means is incredibly difficult. In general terms, it is considered a relatively stable and restricted phenotype that a cell assumes at a specified stage of its development. Conversely, cell states are considered different phenotypes that individual cells can dynamically move between.
Both states and types will, however, be characterized by different gene expression profiles, and thus correspond to different points on the multidimensional graph. Therefore, distinguishing between the two requires establishing how cells can move around that space.
Because cells of many types can assume multiple states, each state needs characterizing to avoid confusing different cell states for different cell types. Complicating matters, it is now known that at any stage in life, certain cells can differentiate into what has conventionally been considered a different type, further blurring the lines between what is defined as a state and a type. The more rigorously researchers interrogate these definitions, the more slippery they get.
Finally, numerous HCA studies have compared cell types in healthy and disease affected tissue, observing how novel cell types or states occur in pathological conditions. In addition, the consortium, notably, responded quickly to the Covid-19 pandemic, establishing which cell types are infected by the novel coronavirus and how this affected them.
FROM DESCRIPTION TO EXPERIMENTATION
To enable the construction of an entire human atlas, 18 HCA Biological Networks have been established for specific tissues. For example, the Developmental and Pediatric Networks are creating atlases that chart the cellular lineages that generate human cellular diversity. Also, there is the Organoid Cell Atlas, led by Professor Christoph Bock at the CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences in Vienna.
Organoids are 3D cultures of cells that recapitulate – at microor mesoscopic levels – structural features of in vivo tissues. For some time, Bock used single-cell omics to investigate how human organoids respond when perturbed by genetic changes introduced by CRISPR technologies. As he explains, he felt that “the HCA, to achieve its goals, would have to go beyond descriptive profiling and take function into account.”
Because organoids offer ways of doing functional assays on human tissues, Bock contacted Teichmann and Regev to discuss producing an Organoid Cell Atlas. By cataloguing the cell types present in organoids, researchers will see which are present and how they differ from those in intact systems. “The data we’re generating will help make better organoids by seeing what is missing in the organoids, then adapting the conditions,” Bock says.
More importantly for the overall HCA project, Bock envisages that hypotheses emerging from in vivo observations – including disease-related changes – will be testable using organoids. “The idea”, he says, “is to build a computational platform: an organoid cell atlas portal that allows you to computationally go back and forth.”
So far, the projects are merging well. “There’s a certain feeling of community, that we’re in this together and want to succeed at this as a global group,” Bock says. He praises in particular how the HCA collaboration “has set a tone of how science is done in this field, which is something I very much appreciate, that it’s about reproducibility, openness, and helping each other out.”
This sentiment is echoed by Regev. When asked if she is surprised by how quickly the project has progressed, she says, “I am not, actually. I expected no less from these scientists – and I deeply believe in the power of scientific networks.”
A COLLABORATION OF CARTOGRAPHERS
Organization
• Around 2,000 contributing investigators from nearly 80 countries.
• Organized by a committee of 29 leading scientists, chaired by Sarah Teichmann and Aviv Regev.
• Eighteen biological networks for coordination of work on systems/organs/tissues, including development and organoids.
• Central data coordination platform.
Process
• Prospective contributors can join and contribute at any stage – from planning to post-publication (registration is simple at www. humancellatlas.org/join-HCA).
• Data collection through existing databases and the HCA data coordination platform with metadata and ethics standards (data.humancellatlas.org).
• Tissue samples include biopsies from healthy volunteers, resection tissue, post mortem samples from deceased organ donors, and human developmental samples.
• Over 65 papers published as part of the initiative since 2016.
Values
• Ethics and equity working groups to provide support for ethical tissue access and to ensure diverse ethnic and geographic groups are represented, participate, and benefit from the project.
• Support for outreach, data sharing, discussion, collaboration.
• Open science as cornerstone, data and methods freely accessible to everyone as early as possible.
Funding
• Several hundred million dollars.
• Significant supporters: Chan Zuckerberg Initiative, Wellcome, NIH, UKRI/MRC, British Heart Foundation, the European Commission, the Helmsley Charitable Trust, and others.