1  Intro ScRNAseq

CautionTo be done before publication

1.1 For some help see

1.2 Description of workflow

This workflow takes raw 10x Genomics snRNA-seq FASTQs (starting from Bd21_3_snRNA_fastq.tar.gz) to a cell×gene count matrix and into a basic Seurat analysis in R. It’s intron-aware (snRNA) and can be run with either Cell Ranger or STARsolo. Below are the exact steps, key parameters, expected outputs, and how to load them in R.

The computing cluster I use runs Slurm. SLURM (Simple Linux Utility for Resource Management) is a job manager for high-performance computing (HPC) clusters. The code will need to be modified if you are not using this system. For more information on the cluster I used, see: IBIOL-compute-resources

1.2.1 Installation on the computing cluster

You need to install:
- cellranger 8.8.0 (or > 7) (Cell Ranger (10x Genomics), the main engine that 1. builds the reference (mkref command) and 2. Aligns/counts by sample (count command))
- gffread 0.12.7 (converts GFF3 annotation → GTF (Cell Ranger needs GTF))
- perl, awk, tar, gzip, md5sum

You installed things into ~/bin
On shared clusters you usually don’t have root. So you install software in your home or data space. Anything placed (or symlinked) in ~/bin is easy to run because you juste need to add ~/bin to your PATH. It’s perfect for stand-alone tools that aren’t on conda (e.g., Cell Ranger). You keep the real install in ~/opt (or /data//opt) and put a symlink in ~/bin. Changing versions is then just updating the symlink.

I also install mamba for tools that exist on conda (e.g., gffread 0.12.7), you created a mamba environment and activated it in your jobs.

1 - So i go to my /home directory
cd /home/cmaslard 2- Install micromamba correctly in ~/.local/bin
curl -L https://github.com/mamba-org/micromamba-releases/releases/latest/download/micromamba-linux-64
-o ~/.local/bin/micromamba

chmod +x ~/.local/bin/micromamba export PATH="$HOME/.local/bin:$PATH" echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc

verification

~/.local/bin/micromamba --help | head -n 1 Version: 2.3.2

3- active le hook pour CE shell (immédiat)

eval "$(~/.local/bin/micromamba shell hook -s bash)

(et optionnellement, persistant pour plus tard)

~/.local/bin/micromamba shell init -s bash -p ~/micromamba source ~/.bashrc

4- Initialise micromamba et crée un env “cr” avec gffread (+ outils utiles) micromamba

shell init -s bash -p ~/micromamba source ~/.bashrc

micromamba create -y -n cr -c conda-forge -c bioconda gffread=0.12.7 seqtk pigz micromamba activate cr

6- Verification

gffread --version

Install Cell Ranger
curl -o cellranger-9.0.1.tar.gz "https://cf.10xgenomics.com/releases/cell-exp/cellranger-9.0.1.tar.gz?Expires=1758522619&Key-Pair-Id=APKAI7S6A5RYOXBWRPDA&Signature=inapNYfLwwmkYLeV3b5r9wzip3sGQM2Z7dXYD1o4cybVQciylvsEQGJgOdrTfBHv4GCd9b1hf3j06y6lQOUeLimkGdthwpVrn6nNmQ8vtrxGZfDqXW4MrDDJDT5TX4TjxFutlOtJ8MVL6SNLPblXvnIVzhrFHPcKzC4OX-QI8fXCaSHP0vOOCX77x8qWrdRekOYyqp0SRk6MdW3I-T9z5P9tVCfm2WcuZoicrDg1gTUVzcdPQTXKygEgqKRts-k5iMI4EFs3l81nTvzObW6ASWg0gwfcziN-YoTuXYe1a2nGa2O2a4~EGi0AbbnloqOzqttRVLEUtwXcJVhg__

1.2.2 Tree structure

  • mkdir -p /data/cmaslard/{fastq_archives,fastq,fastq_4h_new,refs,inputs_ref,runs,logs}
  • Add Bd21_3_snRNA_fastq.tar.gz and Bd21_3_new_4h_snRNA_fastq.tar.gz into fastq_archives
  • Add BdistachyonBd21_3_537_v1.0.fa, BdistachyonBd21_3_537_v1.2.gene.gff3, Oryza_sativa_chloroplast.gff3 and Oryza_sativa_mitochondria.gff3 into input_ref folder

1.2.3 Sript for the cluster

Two files have been created. An .sbatch file for batch submission with a log and for XXXXX, and an .sh file for XXXXXX. Here are the two scripts:

And then just launch sbatch ~/snrna_setup.sbatch

1.2.4 Results after cell range

snrna_setup.sbatch aims to create a ‘pre-mRNA’ reference (genes + introns) and an alignment index (STAR/Cell Ranger). At the end, I obtain a reference file such as: - ref/ (or ref_premrna/, depending on your script) - fasta/ → genome.fa (concat of FASTA files) - genes/ → genes.gtf (GTF modified to “pre-mRNA”, i.e. introns included) - star/ (or equivalent) → large package of index files (Genome, SA, SAindex, chrLength.txt, etc.) - one or more meta files (reference.json / reference.csv), depending on the tool.

Whether you use Cell Ranger Count (with –include-introns or a pre-mRNA reference) or STARsolo/Alevin, each sample produces an outs/ folder with (gene expression side):
- outs/web_summary.html (interactive QC) - outs/metrics_summary.csv - Matrices: - outs/filtered_feature_bc_matrix/ - barcodes.tsv.gz - features.tsv.gz (or genes.tsv.gz) - matrix.mtx.gz - outs/raw_feature_bc_matrix/ (same, unfiltered) - sometimes a single HDF5: filtered_feature_bc_matrix.h5

1.2.5 Report cell ranger