All blog posts

Enriching Insights with Single-cell RNA-seq: Integrating Spatial Data Strategically

Discover how integrating spatial transcriptomics with scRNA-seq data enhances biological insights by mapping gene expression to tissue architecture. Learn key integration methods and real-world applications.
Blog
Research Insights
Enriching Insights with Single-cell RNA-seq: Integrating Spatial Data Strategically

Integrating Spatial Data with Single-Cell RNA-Seq: Enhancing Biological Insights Through Tissue Context

In Part 1 of this 3 part series, we explored how to maximize biological insights from single-cell RNA sequencing (scRNA-seq) data alone. We discussed the power of scRNA-seq in uncovering cellular heterogeneity and touched on advanced analysis strategies. However, one critical element was left in the wings: the spatial context of cells. Spatial transcriptomics adds this missing piece, mapping gene expression onto tissue architecture. In this Part 2, we dive deeper into spatial transcriptomics methodologies and how integrating spatial data can elevate your single-cell analysis to a new dimension. We’ll cover when and how spatial context can enhance scRNA-seq, discuss best-practice integration methods (like Seurat label transfer, SpaGE, Tangram, pciSeq, and BayesSpace), and provide real-world examples where even minimal spatial data yields rich insights. Throughout, we maintain an engaging yet scientific tone – matching Part 1 – and highlight practical tips (including how platforms like Nygen.io support these workflows) without the sales pitch. Let’s unlock spatial context for your single-cell data!

Why Spatial Context Matters for Single-Cell Data

Single-cell RNA-seq excels at profiling gene expression in individual cells, but it inherently loses information about where those cells were in the tissue. Tissues aren’t random collections of cells – they have structure: layers in a brain cortex, niches in bone marrow, tumor-immune cell neighborhoods, developmental gradients in an embryo, and so on. Cellular identity and function are often intimately linked to position and neighbors​. Without spatial context, we might miss patterns of organization or interaction that are biologically important.

Consider an analogy: scRNA-seq gives us a list of characters (cell types) and their lines (gene expressions) in a play, but spatial transcriptomics is the stage direction – it tells us where each character is standing and whom they’re interacting with. By integrating the two, we start to see the full performance. Specifically, adding spatial data can:

  • Reveal Tissue Architecture: Identify how cell types are arranged in anatomical structures (e.g., layered neurons in cortex, zonation in liver lobules, immune cells at infection sites).

  • Uncover Cell Interactions: Show which cell types co-localize or form neighborhoods, shedding light on cell-cell communication (for example, T cells infiltrating a tumor nestle next to cancer cells).

  • Map Developmental Trajectories: In developing tissues, spatial context can highlight gradients and positional cues (e.g., morphogen gradients across an embryo) that correspond to developmental gene expression changes.

  • Enhance Interpretation of Cell States: A cell’s state (activated, quiescent, etc.) might make more sense if we see it in situ – e.g., an “activated” immune cell within a tumor vs. in a lymph node have different implications.

In short, spatial context can turn a list of cell types into a map of cellular geography. For researchers, this means we can ask not only “what cell types are present and what are they expressing?” but also “where are these cells and how might location influence their gene expression?” Let’s briefly recap what spatial transcriptomics technologies entail, then explore how to integrate their data with scRNA-seq.

A Primer on Spatial Transcriptomics Methods

Spatial transcriptomics (ST) refers to techniques that measure gene expression while preserving information about the spatial location of those measurements in the tissue. There are two broad classes of ST methods:

  • Imaging-based methods: These use microscopy to detect RNAs in situ via labeled probes. Examples include seqFISH and MERFISH (multiplexed FISH methods) which can image hundreds or thousands of genes​. They achieve single-cell (even subcellular) resolution, pinpointing exactly which cell (or part of cell) an RNA came from. The trade-off is that they typically measure a targeted gene panel (tens to a few hundreds of genes) due to optical and probe limitations.

  • Sequencing-based methods: These use spatially barcoded arrays or tissue slices to capture transcripts for sequencing. Notable is 10x Genomics Visium (preceded by the original “Spatial Transcriptomics” array technology). These capture the whole transcriptome unbiasedly, but at spots that are larger than single cells Zhao et al., 2021 (each 55 μm spot on Visium captures ~1–10 cells, mixing their mRNAs)​. Thus, resolution is limited – you might know a cluster of transcripts came from somewhere in that spot, but not exactly which cell if multiple cells were present. Other techniques like Slide-seq (beads) and laser capture RNA-seq exist, each with their own balance of resolution vs. throughput​.

There are also emerging platforms combining approaches, but the key point is no single current technology gives you both full transcriptome and exact single-cell resolution across a whole tissue. This is why integration with scRNA-seq is so powerful: we can compensate for each other’s weaknesses. Single-cell seq gives the high gene count and distinct cell identities; spatial gives the location but might have fewer genes or mixed-cell signals. Integrative computational methods are the bridge between them.

When Does Spatial Data Enhance scRNA-seq?

Not every single-cell study will require spatial data, but certain scenarios gain enormous value from spatial integration:

  • Complex Tissue Architecture: If you’re studying a structured tissue (brain, gut, kidney, tumor with microenvironment), spatial data helps map discovered cell types to anatomical structures. For example, a scRNA-seq of the brain might find various neuron subtypes; spatial mapping shows those subtypes form distinct layers or regions e.g., cortex layer 1 interneurons on the surface, oligodendrocytes in white matter​. This can confirm known architecture or reveal novel organization.

  • Cell-Cell Interactions in Disease: In cancer or inflammatory diseases, who is next to whom can illuminate biology. A tumor scRNA-seq might identify T cells and cancer cells, but adding spatial data could show T cells cluster at the invasive margin or around blood vessels, indicating immune infiltration patterns​. Spatial proximity can suggest communication – e.g., a study integrating scRNA-seq and spatial data in glioblastoma mapped malignant cell subtypes and immune cells in the tumor, uncovering how certain macrophages and T cells co-localize with specific tumor cells and interact via ligand-receptor signals​ (Liu et al., 2023).

  • Developmental Gradients and Niches: During development, position is everything. If you have scRNA-seq of an embryo, integrating spatial data (even for a subset of genes) can reveal gradients (head-tail, inside-outside, dorsal-ventral). A great example is a recent mouse organogenesis study: researchers profiled an 8–12 somite stage mouse embryo with seqFISH (measuring 387 genes) and integrated it with scRNA-seq atlases​ (Lohoff et al., 2022) The spatial map enabled them to see a dorsal–ventral patterning of progenitor cells in the developing gut tube that was not apparent from scRNA-seq alone​. This kind of insight – a developmental axis of gene expression – only emerges when you have spatial coordinates.

In summary, if your biological question involves how cells are arranged, who they neighbor, or how location might influence cell behavior, that’s when layering on spatial data is most rewarding. The good news is you don’t always need a massive spatial experiment to get these benefits; even minimal spatial data (like a single Visium slide or a targeted gene panel imaging) can complement a large scRNA-seq dataset. The key is smart integration, which we’ll discuss next.

Strategies to Integrate Single-Cell and Spatial Transcriptomics Data

There has been an explosion of computational methods to integrate scRNA-seq with spatial data. These methods fall into a few categories: label transfer/mapping, imputation of missing genes, deconvolution of mixed signals, and spatial domain detection. Below is a summary of some best-practice tools and what they are used for:

Platform / ToolIntegration ApproachUse Case / Strengths
Anchor-based label mapping (Canonical Correlation Analysis under the hood)Maps cell identities from scRNA-seq onto spatial spots. Ideal for annotating spatial data: e.g., predict the cell type composition of each Visium spot by transferring labels from a well-annotated scRNA-seq reference. Fast and user-friendly; part of the popular Seurat pipeline.
Machine learning imputation of gene expression
Predicts the whole-transcriptome expression in spatial locations by integrating a high-genome-coverage scRNA-seq dataset with a limited-genome spatial dataset. In other words, SpaGE “fills in” the genes that weren’t measured in the spatial experiment. Great for increasing gene coverage – it even uncovered new spatial gene patterns in the mouse brain that were later confirmed by independent in situ hybridization​ (Abdelaal et al., 2020).
Deep learning-based cell mapping (probabilistic alignment)Aligns single-cell data to spatial data by finding the best match of cell transcriptomes to spatial gene expression patterns. Tangram can map individual cells or cell types into a tissue, optimizing so that the density of mapped cells matches the observed spatial expressiontinnguyen-lab.com. It’s versatile – supports mapping to various spatial data types (even MERFISH or histology) – and can handle differences in resolution or throughput between datasets.
Probabilistic cell typing of spatial data using scRNA-seq reference“pciSeq” stands for Probabilistic Cell Inference in Sequencing (from Qian et al. 2019). It uses a Bayesian model to assign cell types to spatial transcriptomics spots based on a reference scRNA-seq atlas. Useful when your spatial method captures limited genes per spot/cell – e.g., if a spatial experiment only has 100 genes, pciSeq leverages the scRNA-seq to tell you which cell type those gene combos most likely represent. It was demonstrated on early ST data to finely map closely-related neuron types in situ.
Bayesian spatial clustering and resolution enhancement
Not an integration per se (it doesn’t require scRNA-seq input), but often used alongside integration. BayesSpace improves the analysis of Visium-style data by modeling neighborhood information: it can increase effective resolution to “subspots” and detect finer spatial domains​ (Zhao et al., 2021). For instance, in a breast cancer Visium dataset, BayesSpace could delineate tumor substructures that were blurred at spot-level resolution. If you have scRNA-seq too, you might cluster with BayesSpace and then label those clusters with scRNA-seq-derived identities.

These are just a selection of tools. Others include Cell2Location, STdeconvolve, Stereoscope, novoSpaRc, Harmony (for multi-modal integration), and more – each with their own algorithmic twists (regression, optimal transport, topic modeling, etc.). In fact, a 2023 review identified 19 different methods for integrating scRNA-seq with spatial data​! The five above have become quite popular for their performance and usability, covering the common needs of single-cell researchers venturing into spatial analysis.

So, how do we put integration into practice? Let’s walk through a practical mindset with an example scenario:

  • Scenario: You have a large scRNA-seq dataset of a tissue (say 50,000 cells from a tumor biopsy). You suspect the spatial arrangement of cells (tumor, immune, stromal) is important – perhaps certain immune cells cluster near tumor cells indicating interactions.

  • Spatial Data Available: Suppose due to budget constraints, only a single 10x Visium slide of that tumor is accessible to you. (or you managed a smaller-scale spatial experiment with, say, 5,000 spots covering a section). This spatial data gives you whole-transcriptome profiles, but each is an averaged mixture of multiple cells.

  • Integration Game Plan: First, you’d analyze your scRNA-seq and identify major cell types (tumor cells, T cells, macrophages, etc.) – using Nygen’s pipeline to cluster and auto-annotate cell types (with marker gene reference) for reliability. Then, using a label transfer method (like Seurat or Cell2Location), you project those labels onto each Visium spot. The result might be a probabilistic estimate of what cells are in each spot – e.g., Spot #123 is 50% tumor, 30% T-cell, 20% macrophage. Now you can visualize a tissue map of cell types: perhaps you see a gradient where the tumor core spots are mostly tumor cells with few T-cells, but the edge spots are rich in T-cells (implying immune infiltration at the margins). This aligns with biological expectations for tumor-immune interactions.

From there, you could refine further. For example, use BayesSpace to subdivide spots in high-T-cell regions, or examine gene expression of interaction molecules (checkpoint ligands, etc.) in spatial context. Integration also works in the other direction: if your spatial data had unique expression patterns, you might adjust your scRNA-seq clustering to align better with spatial domains (some researchers iterate between datasets).

Best Practices: When integrating, always ensure that the datasets truly correspond (ideally same tissue or condition). If you’re mapping a scRNA-seq atlas to a spatial section, any batch differences should be corrected or accounted for. Methods like Harmony or Seurat’s CCA can handle batch effects to some degree during integration​. Also, validation is key – for instance, Lohoff et al. validated their imputed gene expressions by comparing to actual in situ staining​.

You might not always have ground truth, but sanity checks (do known markers localize correctly? do integrated clusters match histology if available?) increase confidence.

Real-World Examples: Spatial + scRNA-seq in Action

Let’s look at a few practical biological scenarios where integrating spatial data with single-cell transcriptomics provided powerful insights:

  • Brain Cortex – Layered Architecture: The brain’s cortex is organized into layers, but scRNA-seq alone might not assign cells to layers. In a frontal cortex study, researchers had a large Drop-seq scRNA-seq dataset (~71k cells) and a smaller spatial dataset (~2.5k cells from STARmap imaging). By integrating them (using the LIGER algorithm), they could assign a layer location to each single-cell cluster​. For example, they identified interneuron subtypes in scRNA-seq and then saw those types were located in layer 1 at the cortex surface versus deep layers for others​ (Welch et al., 2019). The spatial integration also increased the resolution of the spatial data – even though STARmap had fewer cells and targeted genes, combining it with the rich scRNA-seq filled in details and corrected one dataset’s bias with the other. Essentially, large-scale scRNA-seq gave breadth, and spatial gave an anatomical map, together reconstructing the cortical tissue architecture in silico.

Integrating scRNA-seq with spatial data reveals cortical organization.
Figure 1: Integrating scRNA-seq with spatial data reveals cortical organization.

In this example from the mouse brain cortex, a massive dissociated single-cell dataset (red points) was integrated with a much smaller spatially-resolved dataset (blue points). Panel (A) shows a t-SNE of 71k scRNA-seq cells and 2.5k spatial cells (STARmap), colored by dataset​ (Welch et al., 2019).

After integration, the cells were jointly clustered; panel (B) shows the same t-SNE colored by the identified cell clusters, which include excitatory neurons of layers 2/3, 5, 6 (L2/3, L5, L6), interneurons, oligodendrocytes, etc. Critically, panel (D) (bottom) plots the spatial coordinates of the STARmap cells colored by those cluster identities, effectively mapping the scRNA-seq clusters onto the tissue​. This recapitulated known cortex anatomy: for instance, “Astrocyte_Gfap” cells (purple) localized to the meninges (outer surface) and white matter, matching patterns from the Allen Brain Atlas Gfap staining​. Such integration shows where each cell type resides and validates that the scRNA-seq clusters correspond to real spatially segregated populations (Welch et al., 2019)

  • Tumor-Immune Microenvironment: In cancer research, combining spatial and single-cell data is becoming a gold standard to decode the tumor microenvironment. A 2023 glioblastoma study integrated scRNA-seq with spatial transcriptomics to map cell types like malignant cells, T-cells, and myeloid cells in the tumor tissue​ (Liu et al., 2023). They found, for instance, that exhausted CD8 T cells (a type of T-cell) tended to colocalize with certain macrophage populations in spatial hotspots, suggesting immunosuppressive niches. By projecting single-cell-defined malignant cell states onto the spatial map, they observed regions of the tumor dominated by an aggressive mesenchymal-like cancer cell subtype, often adjacent to those immune niches​. These integrated insights led to hypotheses about how macrophages might be fostering a local environment that drives tumor cells into a more malignant state (via signaling pathways like EGFR and CXCL interactions that they identified). In practical terms, spatial mapping in the tumor allowed the researchers to see cellular neighborhoods – something that purely dissociated single-cell data could not reveal. For translational science, this means potential targets (e.g., interrupting a signaling loop in a specific niche) can be identified by knowing which cells actually touch each other in the tumor. Tools used in such analyses include Seurat label transfer for initial cell mapping and CellPhoneDB or similar for ligand-receptor analysis once colocalized cells are identified.

  • Developmental Gradients in an Embryo: We touched on the example of Lohoff et al. 2022 above, but let’s detail it because it’s a beautiful use of minimal spatial data to complement scRNA-seq. The researchers had access to a comprehensive single-cell atlas of mouse embryogenesis (many thousands of cells with full gene profiles). They performed seqFISH on actual embryo sections at a specific stage, but only for 387 selected genes (those were marker genes chosen from the atlas). Now, 387 genes is far from the whole transcriptome – but by integrating with the atlas, they could impute the other genes. Essentially, for each cell in the spatial data, they found the best matching cell in the atlas (based on the 387 genes) and borrowed the rest of that cell’s gene profile. The result was a complete genomic expression map of the embryo in space​ (Lohoff et al., 2022). With this, they discovered an early dorsal-ventral division of cell fates in the gut tube that the atlas alone hadn’t revealed. In practice, this demonstrates that a relatively small spatial experiment can amplify the value of a big single-cell dataset. By strategically choosing a subset of genes, they anchored the single-cell data onto the tissue. This strategy can be applied in other contexts: for example, imagine you have a single-cell atlas of a disease tissue – you could design a spatial experiment with top marker genes and then map the atlas in, achieving near whole-transcriptome insight in situ with a fraction of the experimental cost of a full spatial transcriptome. SpaGE and Tangram are well-suited for this kind of task (imputing missing genes or aligning cells to space, respectively).
Spatial mapping of a mouse embryo using limited gene data integrated with an scRNA-seq atlas. Lohoff et al. (2022) profiled a mouse embryo with seqFISH for 387 genes and integrated it with single-cell RNA-seq atlases to achieve a high-resolution spatial transcriptome​.
Figure 2: Spatial mapping of a mouse embryo using limited gene data integrated with an scRNA-seq atlas. Lohoff et al. (2022) profiled a mouse embryo with seqFISH for 387 genes and integrated it with single-cell RNA-seq atlases to achieve a high-resolution spatial transcriptome​.

In the figure above, panel (b) shows an E8.5 embryo section with each dot representing a cell, colored by its predicted cell type (focusing on gut tube and nearby cells). By integrating, they could assign each cell a type and even infer the expression of genes not measured by seqFISH. This uncovered subtle spatial patterning – for instance, progenitor cells of the future trachea (ventral lung, teal) and esophagus (dorsal lung, orange) were found segregated to the ventral vs. dorsal sides of the gut tube, respectively, even at this early stage​. Such dorsal-ventral separation was confirmed by follow-up in situ hybridization (see panel (h) and (j), where markers like Tbx1 and Shh are expressed in complementary patterns) and was not evident from the dissociated data alone. This example highlights how minimal spatial data (a few hundred genes) can be leveraged with a rich scRNA-seq reference to strategically fill in the blanks, providing insights into tissue patterning and developmental biology​.

Adding Spatial Context through Nygen Workflows

For those eager to implement these integrations, it’s worth noting that many analysis platforms (including Nygen Analytics) support such multi-modal data layering. You don’t have to build everything from scratch in R or Python if that’s not your preference. For example, Nygen’s cloud platform enables no-code analysis of scRNA-seq data (from quality control and clustering to differential expression), and it also provides ways to incorporate spatial information. Users can import spatial coordinates of cells or spots into Nygen and visualize gene expression or clusters on tissue layouts​. This means after you identify clusters in your single-cell data, you could map them onto an actual tissue image or coordinate system if you have it. Nygen’s knowledge base has an article titled “Importing Spatial and Clonotype Data” that guides researchers to upload spatial metadata (like x,y coordinates for each cell/spot) post-analysis, effectively marrying the expression data with spatial organization​. The result can be interactive plots where you see your clustered cells scattered according to their original tissue positions, revealing spatial patterns without writing custom code.

Internally, such platforms may leverage the aforementioned algorithms. For instance, behind the scenes, label transfer in Seurat or cell2location’s Bayesian mapping could be part of a workflow – the user might simply see an option to “annotate spatial dataset with single-cell references” and get results with a few clicks. Nygen also emphasizes reproducibility and ease of use, so an academic researcher can focus on interpreting the spatial biology rather than wrangling complex pipelines. (For more on making single-cell analysis accessible, see our earlier post on addressing bioinformatics skill gaps and intuitive tools.)

Tip: If you’re curating a single-cell atlas with Nygen, consider also publishing any spatial data you have in the Nygen Database or linking to it in your project. It creates a more comprehensive resource (think of it like adding a map layer to your atlas). And if you lack spatial data, Nygen’s network might help you find sequencing core facilities that offer spatial transcriptomics services​ – a useful pointer if you decide to generate spatial data to complement your single-cell study.

Conclusion and Outlook (What to Expect in Part III)

By integrating spatial transcriptomics strategically with single-cell RNA-seq, researchers can maximize insights into how cells function together in tissues. We’ve seen that even limited spatial information, when combined with robust single-cell data, can elucidate tissue architecture, pinpoint cell interactions, and highlight patterns (like developmental axes) invisible to dissociated-cell analysis. The key is choosing the right integration approach for your question – whether it’s simple label transfer to annotate tissue regions or advanced probabilistic mapping to predict unseen genes. Tools like Seurat, SpaGE, Tangram, pciSeq, and BayesSpace have become invaluable in this endeavor, each addressing different aspects of the integration challenge.

As single-cell and spatial techniques continue to evolve, the line between “scRNA-seq vs spatial” is blurring. In Part III of this series, we will directly compare and contrast single-cell RNA-seq and spatial transcriptomics as complementary technologies. We’ll discuss their respective strengths and limitations and how, together, they form a more complete toolkit for understanding biology – much like how combining a microscope with a cell sorter gives you a fuller picture than either alone.

References

Liu, Y., Wu, Z., Feng, Y., Gao, J., Wang, B., Lian, C., & Diao, B. (2023). Integration analysis of single-cell and spatial transcriptomics reveal the cellular heterogeneity landscape in glioblastoma and establish a polygenic risk model. Frontiers in Oncology, 13, 1109037. https://doi.org/10.3389/fonc.2023.1109037

Lohoff, T., Ghazanfar, S., Missarova, A., Koulena, N., Pierson, N., Griffiths, J. A., Bardot, E. S., Eng, C.-H. L., Tyser, R. C. V., Argelaguet, R., Guibentif, C., Srinivas, S., Briscoe, J., Simons, B. D., Hadjantonakis, A.-K., Göttgens, B., Reik, W., Nichols, J., Cai, L., & Marioni, J. C. (2022). Integration of spatial and single-cell transcriptomic data elucidates mouse organogenesis. Nature Biotechnology, 40(1), 74-85. https://doi.org/10.1038/s41587-021-01006-2

Abdelaal, T., Mourragui, S., Mahfouz, A., & Reinders, M. J. T. (2020). SpaGE: Spatial Gene Enhancement using scRNA-seq. Nucleic Acids Research, 48(18), e107. https://doi.org/10.1093/nar/gkaa740

Looking to add spatial context to your single-cell data? Nygen allows you to import spatial coordinates and visualize your scRNA-seq results within their tissue context. Explore how this simple integration can reveal new biological insights in your research. Learn more about enhancing your single-cell analysis workflow.