Supported file formats

Here you will find the details of the variety of file formats supported by Nygen.

Different technologies, facilities and publications may store their single-cell data and analysis in various file formats. Eliminating an extra step to starting your data analysis, at Nygen we support some of the most popular file formats for single-cell data:

  • HDF5
  • Market matrix
  • H5ad

10X Genomics Cell Ranger - HDF5

You can upload HDF5 format file generated by 10x's Cellranger pipeline. These files are usually named filtered_feature_bc_matrix.h5 or raw_feature_bc_matrix.h5 and contain the count matrix.

Please see the following links that contains further information about Cell Ranger's output:

Cell Ranger output description

Cell Ranger Feature Barcode Matrices (HDF5 Format)

MTX format

The Market matrix format file used by technologies and tools from e.g. 10x Genomics, BD Rhapsody. We also support the upload of individually gzipped (.gz) files for this format. Check out (How to upload page) for tips in uploading .mtx files

During upload, please select all 3 of the following files from your computer and upload them at the same time:

  • matrix.mtx
  • barcodes.tsv
  • features.tsv

Please see the following for details on 10x and BD Rhapsody mtx files:

10X genomics docs on mtx

BD Rhapsody doc page on mtx

AnnData h5ad

The structure of how data is stored in .h5ad files can vary and to ensure a more successful upload, we have provided an outline of what we look for in h5ad files:

  • a raw or unnormalised count data matrix in the shape of number of obs by var, this can be a 2D array or a sparse matrix format. An example of what we look for when searching for the count matrix:
    • adata.raw.X
    • adata.layers.raw_counts
    • adata.X
  • adata.obs
    • most of the data under obs will be imported as cell level metadata
  • adata.var
    • features such as Ensembl IDs, gene symbols, feature types will be used

See docs for AnnData: https://anndata.readthedocs.io/en/stable/

Other file formats

Other file formats such as Seurat objects stored in R files, count matrices stored in plain text in csv/tsv or txt files can be processed manually through our  support team.

Minimally we require data for:

  • count data, a table or matrix of cell by feature values
  • a cell index to identify ‘cells’, e.g. barcodes
  • features data which includes
    • feature ID and/or feature name, e.g. Ensembl IDs, gene symbols, gene names, antibody capture tag
    • information on feature type, e.g. gene expression (RNA), antibody capture (ADT, HTO)

Supplementary file imports

We also allow import of supplementary data to your dataset e.g. file metadata, cell annotations, etc., in a comma-separated table text file, i.e. comma-separated values (.csv) file.

Single-cell technologies and tools are progressively evolving, which means the way data is stored is also ever-changing and expanding. Here at Nygen we aim to accommodate single-cell data regardless of how it’s stored. If you would like to know more or have questions your data is supported on our platform, please feel free to contact our support team.

Yi Su

Bioinfomatician