Enriching Insights with Single-cell RNA-seq: Integrating Spatial Data Strategically

Platform / Tool	Integration Approach	Use Case / Strengths
Seurat Label Transfer	Anchor-based label mapping (Canonical Correlation Analysis under the hood)	Maps cell identities from scRNA-seq onto spatial spots. Ideal for annotating spatial data: e.g., predict the cell type composition of each Visium spot by transferring labels from a well-annotated scRNA-seq reference. Fast and user-friendly; part of the popular Seurat pipeline.
SpaGE	Machine learning imputation of gene expression	Predicts the whole-transcriptome expression in spatial locations by integrating a high-genome-coverage scRNA-seq dataset with a limited-genome spatial dataset. In other words, SpaGE “fills in” the genes that weren’t measured in the spatial experiment. Great for increasing gene coverage – it even uncovered new spatial gene patterns in the mouse brain that were later confirmed by independent in situ hybridization (Abdelaal et al., 2020).
Tangram	Deep learning-based cell mapping (probabilistic alignment)	Aligns single-cell data to spatial data by finding the best match of cell transcriptomes to spatial gene expression patterns. Tangram can map individual cells or cell types into a tissue, optimizing so that the density of mapped cells matches the observed spatial expressiontinnguyen-lab.com. It’s versatile – supports mapping to various spatial data types (even MERFISH or histology) – and can handle differences in resolution or throughput between datasets.
pciSeq	Probabilistic cell typing of spatial data using scRNA-seq reference	“pciSeq” stands for Probabilistic Cell Inference in Sequencing (from Qian et al. 2019). It uses a Bayesian model to assign cell types to spatial transcriptomics spots based on a reference scRNA-seq atlas. Useful when your spatial method captures limited genes per spot/cell – e.g., if a spatial experiment only has 100 genes, pciSeq leverages the scRNA-seq to tell you which cell type those gene combos most likely represent. It was demonstrated on early ST data to finely map closely-related neuron types in situ.
BayesSpace	Bayesian spatial clustering and resolution enhancement	Not an integration per se (it doesn’t require scRNA-seq input), but often used alongside integration. BayesSpace improves the analysis of Visium-style data by modeling neighborhood information: it can increase effective resolution to “subspots” and detect finer spatial domains (Zhao et al., 2021). For instance, in a breast cancer Visium dataset, BayesSpace could delineate tumor substructures that were blurred at spot-level resolution. If you have scRNA-seq too, you might cluster with BayesSpace and then label those clusters with scRNA-seq-derived identities.

These are just a selection of tools. Others include Cell2Location, STdeconvolve, Stereoscope, novoSpaRc, Harmony (for multi-modal integration), and more – each with their own algorithmic twists (regression, optimal transport, topic modeling, etc.). In fact, a 2023 review identified 19 different methods for integrating scRNA-seq with spatial data! The five above have become quite popular for their performance and usability, covering the common needs of single-cell researchers venturing into spatial analysis.

So, how do we put integration into practice? Let’s walk through a practical mindset with an example scenario:

Scenario: You have a large scRNA-seq dataset of a tissue (say 50,000 cells from a tumor biopsy). You suspect the spatial arrangement of cells (tumor, immune, stromal) is important – perhaps certain immune cells cluster near tumor cells indicating interactions.
Spatial Data Available: Suppose due to budget constraints, only a single 10x Visium slide of that tumor is accessible to you. (or you managed a smaller-scale spatial experiment with, say, 5,000 spots covering a section). This spatial data gives you whole-transcriptome profiles, but each is an averaged mixture of multiple cells.
Integration Game Plan: First, you’d analyze your scRNA-seq and identify major cell types (tumor cells, T cells, macrophages, etc.) – using Nygen’s pipeline to cluster and auto-annotate cell types (with marker gene reference) for reliability. Then, using a label transfer method (like Seurat or Cell2Location), you project those labels onto each Visium spot. The result might be a probabilistic estimate of what cells are in each spot – e.g., Spot #123 is 50% tumor, 30% T-cell, 20% macrophage. Now you can visualize a tissue map of cell types: perhaps you see a gradient where the tumor core spots are mostly tumor cells with few T-cells, but the edge spots are rich in T-cells (implying immune infiltration at the margins). This aligns with biological expectations for tumor-immune interactions.

From there, you could refine further. For example, use BayesSpace to subdivide spots in high-T-cell regions, or examine gene expression of interaction molecules (checkpoint ligands, etc.) in spatial context. Integration also works in the other direction: if your spatial data had unique expression patterns, you might adjust your scRNA-seq clustering to align better with spatial domains (some researchers iterate between datasets).

Best Practices: When integrating, always ensure that the datasets truly correspond (ideally same tissue or condition). If you’re mapping a scRNA-seq atlas to a spatial section, any batch differences should be corrected or accounted for. Methods like Harmony or Seurat’s CCA can handle batch effects to some degree during integration. Also, validation is key – for instance, Lohoff et al. validated their imputed gene expressions by comparing to actual in situ staining.

You might not always have ground truth, but sanity checks (do known markers localize correctly? do integrated clusters match histology if available?) increase confidence.

Real-World Examples: Spatial + scRNA-seq in Action

Let’s look at a few practical biological scenarios where integrating spatial data with single-cell transcriptomics provided powerful insights:

Brain Cortex – Layered Architecture: The brain’s cortex is organized into layers, but scRNA-seq alone might not assign cells to layers. In a frontal cortex study, researchers had a large Drop-seq scRNA-seq dataset (~71k cells) and a smaller spatial dataset (~2.5k cells from STARmap imaging). By integrating them (using the LIGER algorithm), they could assign a layer location to each single-cell cluster. For example, they identified interneuron subtypes in scRNA-seq and then saw those types were located in layer 1 at the cortex surface versus deep layers for others (Welch et al., 2019). The spatial integration also increased the resolution of the spatial data – even though STARmap had fewer cells and targeted genes, combining it with the rich scRNA-seq filled in details and corrected one dataset’s bias with the other. Essentially, large-scale scRNA-seq gave breadth, and spatial gave an anatomical map, together reconstructing the cortical tissue architecture in silico.

‍

**Figure 1: Integrating scRNA-seq with spatial data reveals cortical organization.**

In this example from the mouse brain cortex, a massive dissociated single-cell dataset (red points) was integrated with a much smaller spatially-resolved dataset (blue points). Panel (A) shows a t-SNE of 71k scRNA-seq cells and 2.5k spatial cells (STARmap), colored by dataset (Welch et al., 2019).

After integration, the cells were jointly clustered; panel (B) shows the same t-SNE colored by the identified cell clusters, which include excitatory neurons of layers 2/3, 5, 6 (L2/3, L5, L6), interneurons, oligodendrocytes, etc. Critically, panel (D) (bottom) plots the spatial coordinates of the STARmap cells colored by those cluster identities, effectively mapping the scRNA-seq clusters onto the tissue. This recapitulated known cortex anatomy: for instance, “Astrocyte_Gfap” cells (purple) localized to the meninges (outer surface) and white matter, matching patterns from the Allen Brain Atlas Gfap staining. Such integration shows where each cell type resides and validates that the scRNA-seq clusters correspond to real spatially segregated populations (Welch et al., 2019)

Tumor-Immune Microenvironment: In cancer research, combining spatial and single-cell data is becoming a gold standard to decode the tumor microenvironment. A 2023 glioblastoma study integrated scRNA-seq with spatial transcriptomics to map cell types like malignant cells, T-cells, and myeloid cells in the tumor tissue (Liu et al., 2023). They found, for instance, that exhausted CD8 T cells (a type of T-cell) tended to colocalize with certain macrophage populations in spatial hotspots, suggesting immunosuppressive niches. By projecting single-cell-defined malignant cell states onto the spatial map, they observed regions of the tumor dominated by an aggressive mesenchymal-like cancer cell subtype, often adjacent to those immune niches. These integrated insights led to hypotheses about how macrophages might be fostering a local environment that drives tumor cells into a more malignant state (via signaling pathways like EGFR and CXCL interactions that they identified). In practical terms, spatial mapping in the tumor allowed the researchers to see cellular neighborhoods – something that purely dissociated single-cell data could not reveal. For translational science, this means potential targets (e.g., interrupting a signaling loop in a specific niche) can be identified by knowing which cells actually touch each other in the tumor. Tools used in such analyses include Seurat label transfer for initial cell mapping and CellPhoneDB or similar for ligand-receptor analysis once colocalized cells are identified.
Developmental Gradients in an Embryo: We touched on the example of Lohoff et al. 2022 above, but let’s detail it because it’s a beautiful use of minimal spatial data to complement scRNA-seq. The researchers had access to a comprehensive single-cell atlas of mouse embryogenesis (many thousands of cells with full gene profiles). They performed seqFISH on actual embryo sections at a specific stage, but only for 387 selected genes (those were marker genes chosen from the atlas). Now, 387 genes is far from the whole transcriptome – but by integrating with the atlas, they could impute the other genes. Essentially, for each cell in the spatial data, they found the best matching cell in the atlas (based on the 387 genes) and borrowed the rest of that cell’s gene profile. The result was a complete genomic expression map of the embryo in space (Lohoff et al., 2022). With this, they discovered an early dorsal-ventral division of cell fates in the gut tube that the atlas alone hadn’t revealed. In practice, this demonstrates that a relatively small spatial experiment can amplify the value of a big single-cell dataset. By strategically choosing a subset of genes, they anchored the single-cell data onto the tissue. This strategy can be applied in other contexts: for example, imagine you have a single-cell atlas of a disease tissue – you could design a spatial experiment with top marker genes and then map the atlas in, achieving near whole-transcriptome insight in situ with a fraction of the experimental cost of a full spatial transcriptome. SpaGE and Tangram are well-suited for this kind of task (imputing missing genes or aligning cells to space, respectively).

Spatial mapping of a mouse embryo using limited gene data integrated with an scRNA-seq atlas. Lohoff et al. (2022) profiled a mouse embryo with seqFISH for 387 genes and integrated it with single-cell RNA-seq atlases to achieve a high-resolution spatial transcriptome. — Figure 2: Spatial mapping of a mouse embryo using limited gene data integrated with an scRNA-seq atlas. *Lohoff et al. (2022) profiled a mouse embryo with seqFISH for 387 genes and integrated it with single-cell RNA-seq atlases to achieve a* *high-resolution spatial transcriptome*.

In the figure above, panel (b) shows an E8.5 embryo section with each dot representing a cell, colored by its predicted cell type (focusing on gut tube and nearby cells). By integrating, they could assign each cell a type and even infer the expression of genes not measured by seqFISH. This uncovered subtle spatial patterning – for instance, progenitor cells of the future trachea (ventral lung, teal) and esophagus (dorsal lung, orange) were found segregated to the ventral vs. dorsal sides of the gut tube, respectively, even at this early stage. Such dorsal-ventral separation was confirmed by follow-up in situ hybridization (see panel (h) and (j), where markers like Tbx1 and Shh are expressed in complementary patterns) and was not evident from the dissociated data alone. This example highlights how minimal spatial data (a few hundred genes) can be leveraged with a rich scRNA-seq reference to strategically fill in the blanks, providing insights into tissue patterning and developmental biology.

‍

Adding Spatial Context through Nygen Workflows

For those eager to implement these integrations, it’s worth noting that many analysis platforms (including Nygen Analytics) support such multi-modal data layering. You don’t have to build everything from scratch in R or Python if that’s not your preference. For example, Nygen’s cloud platform enables no-code analysis of scRNA-seq data (from quality control and clustering to differential expression), and it also provides ways to incorporate spatial information. Users can import spatial coordinates of cells or spots into Nygen and visualize gene expression or clusters on tissue layouts. This means after you identify clusters in your single-cell data, you could map them onto an actual tissue image or coordinate system if you have it. Nygen’s knowledge base has an article titled “Importing Spatial and Clonotype Data” that guides researchers to upload spatial metadata (like x,y coordinates for each cell/spot) post-analysis, effectively marrying the expression data with spatial organization. The result can be interactive plots where you see your clustered cells scattered according to their original tissue positions, revealing spatial patterns without writing custom code.

Internally, such platforms may leverage the aforementioned algorithms. For instance, behind the scenes, label transfer in Seurat or cell2location’s Bayesian mapping could be part of a workflow – the user might simply see an option to “annotate spatial dataset with single-cell references” and get results with a few clicks. Nygen also emphasizes reproducibility and ease of use, so an academic researcher can focus on interpreting the spatial biology rather than wrangling complex pipelines. (For more on making single-cell analysis accessible, see our earlier post on addressing bioinformatics skill gaps and intuitive tools.)

Tip: If you’re curating a single-cell atlas with Nygen, consider also publishing any spatial data you have in the Nygen Database or linking to it in your project. It creates a more comprehensive resource (think of it like adding a map layer to your atlas). And if you lack spatial data, Nygen’s network might help you find sequencing core facilities that offer spatial transcriptomics services – a useful pointer if you decide to generate spatial data to complement your single-cell study.

Conclusion and Outlook (What to Expect in Part III)

By integrating spatial transcriptomics strategically with single-cell RNA-seq, researchers can maximize insights into how cells function together in tissues. We’ve seen that even limited spatial information, when combined with robust single-cell data, can elucidate tissue architecture, pinpoint cell interactions, and highlight patterns (like developmental axes) invisible to dissociated-cell analysis. The key is choosing the right integration approach for your question – whether it’s simple label transfer to annotate tissue regions or advanced probabilistic mapping to predict unseen genes. Tools like Seurat, SpaGE, Tangram, pciSeq, and BayesSpace have become invaluable in this endeavor, each addressing different aspects of the integration challenge.

As single-cell and spatial techniques continue to evolve, the line between “scRNA-seq vs spatial” is blurring. In Part III of this series, we will directly compare and contrast single-cell RNA-seq and spatial transcriptomics as complementary technologies. We’ll discuss their respective strengths and limitations and how, together, they form a more complete toolkit for understanding biology – much like how combining a microscope with a cell sorter gives you a fuller picture than either alone.

‍

References

Liu, Y., Wu, Z., Feng, Y., Gao, J., Wang, B., Lian, C., & Diao, B. (2023). Integration analysis of single-cell and spatial transcriptomics reveal the cellular heterogeneity landscape in glioblastoma and establish a polygenic risk model. Frontiers in Oncology, 13, 1109037. https://doi.org/10.3389/fonc.2023.1109037

Lohoff, T., Ghazanfar, S., Missarova, A., Koulena, N., Pierson, N., Griffiths, J. A., Bardot, E. S., Eng, C.-H. L., Tyser, R. C. V., Argelaguet, R., Guibentif, C., Srinivas, S., Briscoe, J., Simons, B. D., Hadjantonakis, A.-K., Göttgens, B., Reik, W., Nichols, J., Cai, L., & Marioni, J. C. (2022). Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nature Biotechnology, 40(1), 74-85. https://doi.org/10.1038/s41587-021-01006-2

Abdelaal, T., Mourragui, S., Mahfouz, A., & Reinders, M. J. T. (2020). SpaGE: Spatial Gene Enhancement using scRNA-seq. Nucleic Acids Research, 48(18), e107. https://doi.org/10.1093/nar/gkaa740

‍

Enriching Insights with Single-cell RNA-seq: Integrating Spatial Data Strategically

Integrating Spatial Data with Single-Cell RNA-Seq: Enhancing Biological Insights Through Tissue Context

Why Spatial Context Matters for Single-Cell Data

A Primer on Spatial Transcriptomics Methods

When Does Spatial Data Enhance scRNA-seq?

Strategies to Integrate Single-Cell and Spatial Transcriptomics Data

Real-World Examples: Spatial + scRNA-seq in Action

Adding Spatial Context through Nygen Workflows

Conclusion and Outlook (What to Expect in Part III)