Additional Analysis Parameters

Learn how to customize key parameters in ScarfWeb, including PCA dimensions, kNN graph, clustering resolution, and species info, to enhance your dataset analysis.

Additional parameters for analysis

Here, you can edit parameters that will primarily impact the results of clustering and embedding (UMAP/tSNE).

Similarly with previous steps, the default values are carried over from the latest analysis run if you have already run an analysis on the same dataset.

Parameters for KNN graph creation

PCA dimensions

PCA is a linear dimension reduction step. Using PCA in the analysis workflow, we try to reduce the redundancy in the data that may arise due to high correlation among genes. Since the number of differential biological pathways and processes among cell types is far fewer than that of HVGs (highly variable genes), the HVGs can be compressed into fewer latent features in PCA components. Since the PCA components are orthogonal, each component adds non-redundant information.

How many PCA dimensions should I choose?

This number is hard to estimate before executing the analysis. Choosing too many PCA components will lead to the 'curse-of-dimensionality', meaning the distances between the cells will become less meaningful due to the inclusion of the components capturing low variability.

k nearest neighbour

A kNN graph is a network where every node represents a cell, and every edge connecting two cells represents similar cells. kNN graphs capture the underlying relationship among the cells, revealing clusters of cells and their similarity (this process is also called manifold learning).

This kNN graph structure is dimension-free, and hence, tools like UMAP are used to "embed" it into 2 or 3-dimensional space.

Parameters for clustering

Resolution

The clustering process assigns each cell to a group or cluster. A cluster of cells, in most cases, can be interpreted as a stable (cell type) or metastable cell state.

Most clustering approaches create non-overlapping partitions, meaning each cell can belong to only one cluster at a time.

In ScarfWeb, we use the kNN graph and apply clustering methods directly to this graph. The clustering algorithms that aim to partition the kNN graph are also called community detection methods. In ScarfWeb, we use the Leiden clustering method.

The resolution parameter controls the number of clusters that will be obtained. The higher clustering resolution will lead to more clusters, while the lower clustering resolution will lead to fewer clusters.

Learn more about Leiden clustering

Other parameters

Species

In case you are not sure which species to choose, the Other/Unknown/Mixed option.

❓ Where do we use the species information? The species information is not critical for basic analysis steps like obtaining UMAP and clustering; however, it is used in the pathway enrichment step to use the correct underlying database. Currently, the pathway enrichment is performed only on human and mouse datasets.

Yi Su

Bioinfomatician