Getting Started
Data Management
Analysis Pipeline
Visual Exploration
Interactive Analysis
Collaboration
Miscellanous
Learn how to customize key parameters in ScarfWeb, including PCA dimensions, kNN graph, clustering resolution, and species info, to enhance your dataset analysis.
Here, you can edit parameters that will primarily impact the results of clustering and embedding (UMAP/tSNE).
Similarly with previous steps, the default values are carried over from the latest analysis run if you have already run an analysis on the same dataset.
PCA is a linear dimension reduction step. Using PCA in the analysis workflow, we try to reduce the redundancy in the data that may arise due to high correlation among genes. Since the number of differential biological pathways and processes among cell types is far fewer than that of HVGs (highly variable genes), the HVGs can be compressed into fewer latent features in PCA components. Since the PCA components are orthogonal, each component adds non-redundant information.
❓How many PCA dimensions should I choose?
This number is hard to estimate before executing the analysis. Choosing too many PCA components will lead to the 'curse-of-dimensionality', meaning the distances between the cells will become less meaningful due to the inclusion of the components capturing low variability.
A kNN graph is a network where every node represents a cell, and every edge connecting two cells represents similar cells. kNN graphs capture the underlying relationship among the cells, revealing clusters of cells and their similarity (this process is also called manifold learning).
This kNN graph structure is dimension-free, and hence, tools like UMAP are used to "embed" it into 2 or 3-dimensional space.
The clustering process assigns each cell to a group or cluster. A cluster of cells, in most cases, can be interpreted as a stable (cell type) or metastable cell state.
Most clustering approaches create non-overlapping partitions, meaning each cell can belong to only one cluster at a time.
In ScarfWeb, we use the kNN graph and apply clustering methods directly to this graph. The clustering algorithms that aim to partition the kNN graph are also called community detection methods. In ScarfWeb, we use the Leiden clustering method.
The resolution parameter controls the number of clusters that will be obtained. The higher clustering resolution will lead to more clusters, while the lower clustering resolution will lead to fewer clusters.
Learn more about Leiden clustering
In case you are not sure which species to choose, the Other/Unknown/Mixed
option.