Scientific Updates

Nature Communications | Zhang Zemin's group publishes a new method for evaluating purity of single cell populations

On June 22, Prof. Zemin Zhang’s team from Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Centre for Genomics (ICG), and Peking-Tsinghua Centre for Life Sciences (CLS), Peking University, has published an article in Nature Communications on their new method for unsupervised feature selection and purity assessment of identified cell clusters, which is entitled “An entropy-based metric for assessing the purity of single cell populations”.


The recent advances in scRNA-seq have transformative potential to discover and annotate cell types, providing insights into organ composition, tumor microenvironment, cell lineage, and fundamental cell properties. However, the identification of cell clusters is often determined by manually checking specific signature genes, which are arbitrary and inherently imprecise. In addition, different methods and even parameters used for normalization, feature selection, batch correction, and clustering can also confound the final identified clusters, thus motivating the need to accurately assess the purity or quality of identified clusters.


The PhD students Baolin Liu and Chenwei Li from Zemin Zhang’s lab developed the ROGUE metric and effectively addressed this issue. They extended the previously developed E-test method [1] and used the differential entropy to capture the degree of disorder or randomness of gene expression. The differential entropy has a strong relationship with the mean expression level of genes, thus forming the basis for the expression entropy model (S-E model), which is capable of identifying variable genes with high sensitivity and precision, and shows superior performance than other competing methods (Fig. 1). Based on this model, they propose the Ratio of Global Unshifted Entropy (ROGUE) statistic to quantify the purity or homogeneity of a given single-cell population while accounting for other technical factors.


Through extensive simulations, the researchers demonstrate that ROGUE could accurately assess the purity of single cell populations and guide single-cell clustering regardless of uncontrollable cell-to-cell variation. When applying ROGUE to multiple challenging scRNA-seq datasets, the researches identify additional subtypes and demonstrate the application of ROGUE-guided analyses to detect precise signals in specific subpopulations (Fig. 1).

Figure 1: S-E model and the application of ROGUE


Software implementing the feature selection and purity assessment is available as an open-source R package ROGUE, which can be downloaded at Improving the purity and credibility of the ever-increasing number of cell types is a mounting challenge with explosive efforts toward single-cell sequencing, and ROGUE could become a potential standard for judging the quality of cell clusters.


The PhD student Baolin Liu from BIOPIC/School of Life Sciences and the PhD student Chenwei Li from CLS are the co-first authors of this paper, and Professor Zemin Zhang from the BIOPIC/School of Life Sciences is the corresponding author. This project was funded by the National Natural Science Foundation of China, ICG and Analytical Biosciences Limited (Beijing).




[1] Li, C. et al. SciBet as a portable and fast single cell type identifier. Nat. Commun. 11, 1818 (2020).