Scientific Updates

The Gao Lab proposed a disentangled representation framework for intracellular and extracellular information

Multicellular organisms can be conceptualized as complex spatial networks composed of interconnected cells. A cell’s position is as crucial as its intrinsic properties, and together they determine tissue functionality as well as dysfunction under pathological conditions. Spatial omics technologies enable the in situ mapping of cells within tissues at single-cell resolution, thereby providing a powerful framework for dissecting cellular cooperation and function within their native contexts.


Beyond capturing the transcriptional states of individual cells, spatial omics also reveals the microenvironment in which cells reside. However, a central challenge remains in deciphering the interplay between intracellular programs and extracellular contexts. Existing computational models often integrate these two layers of information in an entangled manner, which not only introduces ambiguity but also hinders mechanistic insights into their interdependencies. Moreover, with the rapid expansion of spatial omics datasets, current approaches face scalability bottlenecks when processing millions of cells at once.


To address these challenges, in August 2025, the group of Ge Gao at Peking University/Changping Laboratory reported in Nature Communications a study entitled Disentangled cellular embeddings for large-scale heterogeneous spatial omics data. In this work, the authors present DECIPHER, a disentangled modeling framework for spatial omics. Compared with previous approaches, DECIPHER offers the following advantages:

  1. Scalability to spatial atlases comprising tens of millions of cells.

  2. The ability to disentangle and systematically characterize the associations between intrinsic gene programs and extrinsic spatial environments.

 


 

Accurate representation of omics data serves as a cornerstone for a wide range of downstream analyses. The Gao laboratory has previously developed multiple representation learning methods for single-cell omics data, including Cell BLAST for single-cell transcriptomics (Nature Communications, 2020), GLUE for single-cell multi-omics (Nature Biotechnology, 2022), and CLUE (NeurIPS Oral, 2023).


For the emerging field of spatial omics, the authors introduce DECIPHER, which adopts a dual-encoder architecture: a molecular encoder to capture intrinsic cellular identity, and a spatial encoder to model the local microenvironment. Notably, the spatial encoder leverages a Transformer-based design, enabling efficient handling of ever-expanding spatial datasets. The entire framework is trained in a self-supervised manner using multi-scale contrastive learning (Fig. 1), thereby achieving accurate and disentangled representations of spatial omics data.



Figure 1 | Schematic of the DECIPHER framework


As a demonstration, the authors used a three-dimensional spatial atlas of the mouse brain comprising over 3 million cells and 200 tissue sections (Fig. 2a)—a scale far beyond the capacity of existing computational methods. In contrast, DECIPHER successfully completed the modeling task within just a few hours. Its outputs not only recapitulated cellular identities with high fidelity but also reconstructed spatial brain regions (Fig. 2b) and revealed cell-type-specific spatial distributions across anatomical structures with remarkable clarity (Fig. 2c).



Figure. 2 DECIPHER applied to atlas-scale spatial omics data


To further exploit DECIPHER’s disentangled representations in linking intracellular programs with extracellular environments, the authors constructed an interpretable machine learning framework to identify molecular signals critical for spatial positioning, such as ligand–receptor interactions mediating cell–cell communication. As an illustrative case, B cells undergo maturation within the germinal centers of lymph nodes. Using DECIPHER representations, the study identified CXCR4–CXCL12 and CXCR5–CXCL13 as the key ligand–receptor pairs associated with B-cell localization. These findings are in line with experimental evidence showing that these interactions are essential for positioning B cells within the light and dark zones of germinal centers.

 



Figure.3 Linking intracellular and extracellular information with DECIPHER


By providing disentangled and scalable representations, DECIPHER offers a new perspective for uncovering the interplay between gene expression programs and spatial microenvironments, thereby advancing our understanding of spatial regulation of cellular functions. All source code for DECIPHER has been made openly available (https://github.com/gao-lab/DECIPHER) and can be readily installed via PyPI.


This work was led by PhD candidate Chen-Rui Xia at the School of Life Sciences, Peking University. Dr. Zhi-Jie Cao, a “Boya” Postdoctoral Fellow (now graduated), is co-first and co-corresponding author. The study was supported by the National Key R&D Program of China, the State Key Laboratory of Gene Function and Regulation, the Beijing Advanced Innovation Center for Future Genomics Diagnostics, and Changping Laboratory. Computational analyses were performed on the high-performance computing platforms of Changping Laboratory, Peking University Pacific HPC Center, and the university’s public HPC.

[Link to article: https://www.nature.com/articles/s41467-025-63140-8]