1 Intro ScRNAseq
1.1 For some help see
1.2 Description of workflow
This workflow takes raw 10x Genomics snRNA-seq FASTQs (starting from Bd21_3_snRNA_fastq.tar.gz) to a cell×gene count matrix and into a basic Seurat analysis in R. It’s intron-aware (snRNA) and can be run with either Cell Ranger or STARsolo. Below are the exact steps, key parameters, expected outputs, and how to load them in R.
The computing cluster I use runs Slurm. SLURM (Simple Linux Utility for Resource Management) is a job manager for high-performance computing (HPC) clusters. The code will need to be modified if you are not using this system. For more information on the cluster I used, see: IBIOL-compute-resources
1.2.1 Installation on the computing cluster
You need to install:
- cellranger 8.8.0 (or > 7) (Cell Ranger (10x Genomics), the main engine that 1. builds the reference (mkref command) and 2. Aligns/counts by sample (count command))
- gffread 0.12.7 (converts GFF3 annotation → GTF (Cell Ranger needs GTF))
- perl, awk, tar, gzip, md5sum
You installed things into ~/bin
On shared clusters you usually don’t have root. So you install software in your home or data space. Anything placed (or symlinked) in ~/bin is easy to run because you juste need to add ~/bin to your PATH. It’s perfect for stand-alone tools that aren’t on conda (e.g., Cell Ranger). You keep the real install in ~/opt (or /data/
I also install mamba for tools that exist on conda (e.g., gffread 0.12.7), you created a mamba environment and activated it in your jobs.
1 - So i go to my /home directory
cd /home/cmaslard 2- Install micromamba correctly in ~/.local/bin
curl -L https://github.com/mamba-org/micromamba-releases/releases/latest/download/micromamba-linux-64
-o ~/.local/bin/micromamba
chmod +x ~/.local/bin/micromamba export PATH="$HOME/.local/bin:$PATH" echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
verification
~/.local/bin/micromamba --help | head -n 1 Version: 2.3.2
3- active le hook pour CE shell (immédiat)
eval "$(~/.local/bin/micromamba shell hook -s bash)”
(et optionnellement, persistant pour plus tard)
~/.local/bin/micromamba shell init -s bash -p ~/micromamba source ~/.bashrc
4- Initialise micromamba et crée un env “cr” avec gffread (+ outils utiles) micromamba
shell init -s bash -p ~/micromamba source ~/.bashrc
micromamba create -y -n cr -c conda-forge -c bioconda gffread=0.12.7 seqtk pigz micromamba activate cr
6- Verification
gffread --version
Install Cell Ranger
curl -o cellranger-9.0.1.tar.gz "https://cf.10xgenomics.com/releases/cell-exp/cellranger-9.0.1.tar.gz?Expires=1758522619&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=inapNYfLwwmkYLeV3b5r9wzip3sGQM2Z7dXYD1o4cybVQciylvsEQGJgOdrTfBHv4GCd9b1hf3j06y6lQOUeLimkGdthwpVrn6nNmQ8vtrxGZfDqXW4MrDDJDT5TX4TjxFutlOtJ8MVL6SNLPblXvnIVzhrFHPcKzC4OX-QI8fXCaSHP0vOOCX77x8qWrdRekOYyqp0SRk6MdW3I-T9z5P9tVCfm2WcuZoicrDg1gTUVzcdPQTXKygEgqKRts-k5iMI4EFs3l81nTvzObW6ASWg0gwfcziN-YoTuXYe1a2nGa2O2a4~EGi0AbbnloqOzqttRVLEUtwXcJVhg__
1.2.2 Tree structure
mkdir -p /data/cmaslard/{fastq_archives,fastq,fastq_4h_new,refs,inputs_ref,runs,logs}- Add
Bd21_3_snRNA_fastq.tar.gzandBd21_3_new_4h_snRNA_fastq.tar.gzintofastq_archives - Add
BdistachyonBd21_3_537_v1.0.fa,BdistachyonBd21_3_537_v1.2.gene.gff3,Oryza_sativa_chloroplast.gff3andOryza_sativa_mitochondria.gff3intoinput_reffolder
1.2.3 Sript for the cluster
Two files have been created. An .sbatch file for batch submission with a log and for XXXXX, and an .sh file for XXXXXX. Here are the two scripts:
And then just launch sbatch ~/snrna_setup.sbatch
1.2.4 Results after cell range
snrna_setup.sbatch aims to create a ‘pre-mRNA’ reference (genes + introns) and an alignment index (STAR/Cell Ranger). At the end, I obtain a reference file such as: - ref/ (or ref_premrna/, depending on your script) - fasta/ → genome.fa (concat of FASTA files) - genes/ → genes.gtf (GTF modified to “pre-mRNA”, i.e. introns included) - star/ (or equivalent) → large package of index files (Genome, SA, SAindex, chrLength.txt, etc.) - one or more meta files (reference.json / reference.csv), depending on the tool.
Whether you use Cell Ranger Count (with –include-introns or a pre-mRNA reference) or STARsolo/Alevin, each sample produces an outs/ folder with (gene expression side):
- outs/web_summary.html (interactive QC) - outs/metrics_summary.csv - Matrices: - outs/filtered_feature_bc_matrix/ - barcodes.tsv.gz - features.tsv.gz (or genes.tsv.gz) - matrix.mtx.gz - outs/raw_feature_bc_matrix/ (same, unfiltered) - sometimes a single HDF5: filtered_feature_bc_matrix.h5